한국어 구 단위화를 위한 규칙 기반 방법과 기억 기반 학습의 결합

박성배; 장병탁

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 논문지 B : 소프트웨어 및 응용

정보과학회 논문지 B : 소프트웨어 및 응용

Current Result Document : 11 / 12 이전건 다음건

한글제목(Korean Title)	한국어 구 단위화를 위한 규칙 기반 방법과 기억 기반 학습의 결합
영문제목(English Title)	A Hybrid of Rule based Method and Memory based Learning for Korean Text Chunking
저자(Author)	박성배 장병탁
원문수록처(Citation)	VOL 31 NO. 03 PP. 0369 ~ 0378 (2004. 03)
한글내용 (Korean Abstract)	한국어나 일본어와 같이 부분 어순 자유 언어에서는 규칙 기반 방법이 구 단위화에 있어서 매우 유용한 방법이며, 실제로 잘 발달된 조사와 어미를 활용하면 소수의 규칙만으로도 여러 가지 기계학습 기법들만큼 높은 성능을 보일 수 있다. 하지만, 이 방법은 규칙의 예외를 처리할 수 있는 방법이 없다는 단점이 있다. 예외 처리는 자연언어처리에서 매우 중요한 문제이며, 기억 기반 학습이 이 문제를 효과적으로 다룰 수 있다. 본 논문에서는, 한국어 단위화를 위해서 규칙 기반 방법과 기억 기반 학습을 결합하는 방법을 제시한다. 제시된 방법은 우선 규칙에 기초하고, 규칙으로 추정한 단위를 기억 기반 학습으로 검증한다. STEP 2000 말뭉치에 대한 실험 결과, 본 논문에서 제시한 방법이 규칙이나 여러 기계학습 기법을 단독으로 사용하였을 때보다 높은 성능을 보였다. 규칙과 구 단위화에 가장 좋은 성능을 보인 Support Vector Machines의 F-score가 각각 91.87과 92.54인데 비하여, 본 논문에서 제시된 방법의 최종 F-score는 94.19이다.
영문내용 (English Abstract)	In partially free word order languages like Korean and Japanese, the rule-based method is effective for text chunking, and shows the performance as high as machine learning methods even with a few rules due to the well-developed overt postpositions and endings. However, it has no ability to handle the exceptions of the rules. Exception handling is an important work in natural language processing, and the exceptions can be efficiently processed in memory-based learning. In this paper, we propose a hybrid of rule-based method and memory-based learning for Korean text chunking. The proposed method is primarily based on the rules, and then the chunks estimated by the rules are verified by memory-based classifier. An evaluation of the proposed method on Korean STEP 2000 corpus yields the improvement in F-score over the rules or various machine learning methods alone. The final F-score is 94.19, while those of the rules and SVMs, the best machine learning method for this task, are just 91.87 and 92.54 respectively.
키워드(Keyword)	하이브리드 방법 규칙 기반 방법 기억 기반 학습 구 단위화
파일첨부	PDF 다운로드