Á¤º¸°úÇÐȸ ³í¹®Áö B : ¼ÒÇÁÆ®¿þ¾î ¹× ÀÀ¿ë
ÇѱÛÁ¦¸ñ(Korean Title) |
ÇÑ±Û ¹®ÀåÀÇ ÀÚµ¿ ¶ç¾î¾²±â¸¦ À§ÇÑ µÎ °¡Áö Åë°èÀû ¸ðµ¨ |
¿µ¹®Á¦¸ñ(English Title) |
Two Statistical Models for Automatic Word Spacing of Korean Sentences |
ÀúÀÚ(Author) |
À̵µ±æ
ÀÌ»óÁÖ
ÀÓÈñ¼®
ÀÓÇØâ
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 30 NO. 04 PP. 0358 ~ 0371 (2003. 04) |
Çѱ۳»¿ë (Korean Abstract) |
ÀÚµ¿ ¶ç¾î¾²±â´Â ¹®Àå ³»¿¡¼ À߸ø ¶ç¾î¾´ ¾îÀýµéÀ» ¿Ã¹Ù¸£°Ô º¹¿øÇÏ´Â °úÁ¤À¸·Î¼, µ¶ÀÚ¿¡°Ô ±ÛÀÇ °¡µ¶¼ºÀ» ³ôÀÌ°í ¹®ÀåÀÇ ¶æÀ» Á¤È®È÷ Àü´ÞÇϱâ À§ÇØ ¸Å¿ì Áß¿äÇÏ´Ù. ±âÁ¸ÀÇ Åë°è ±â¹Ý ÀÚµ¿ ¶ç¾î¾²±â Á¢±Ù ¹æ¹ýµéÀº ÀÌÀü ¶ç¾î¾²±â »óŸ¦ °í·ÁÇÏÁö ¾Ê±â ¶§¹®¿¡ À߸øµÈ È®·ü Á¤º¸¿¡ ÀÇÇÑ ¶ç¾î¾²±â¸¦ ÇÒ ¼ö¹Û¿¡ ¾ø¾ú´Ù. º» ³í¹®¿¡¼´Â ±âÁ¸ÀÇ Åë°è ±â¹Ý Á¢±Ù ¹æ¹ýÀÇ ¹®Á¦Á¡À» ÇØ°áÇÒ ¼ö ÀÖ´Â µÎ °¡Áö Åë°èÀû ¶ç¾î¾²±â ¸ðµ¨À» Á¦¾ÈÇÑ´Ù. Á¦¾ÈÇÏ´Â ¸ðµ¨Àº ÀÚµ¿ ¶ç¾î¾²±â¸¦ Ç°»ç ºÎÂø°ú °°Àº ºÐ·ù ¹®Á¦(classification problem)·Î °£ÁÖÇÒ ¼ö ÀÖ´Ù´Â Âø¾È¿¡ ±â¹ÝÇϸç, Àº´Ð ¸¶¸£ÄÚÇÁ ¸ðµ¨À» ÀϹÝÈÇÔÀ¸·Î½á È®ÀåµÈ ¹®¸ÆÀ» °í·ÁÇÒ ¼ö ÀÖ°í º¸´Ù Á¤È®ÇÑ È®·üÀ» ÃßÁ¤ÇÒ ¼ö ÀÖµµ·Ï °í¾ÈµÇ¾ú´Ù. Á¦¾ÈÇÏ´Â ¸ðµ¨°ú Áö±Ý±îÁö °¡Àå ÁÁÀº ¼º´ÉÀ» º¸ÀÌ´Â ±âÁ¸ÀÇ ¹æ¹ýÀ» ºñ±³Çϱâ À§ÇØ ¿©·¯ °¡Áö ½ÇÇè Á¶°Ç¿¡ µû¸¥ ´Ù¾çÇÑ ½ÇÇèÀ» ¼öÇàÇÏ¿´°í, ¿À·ù¿¡ ´ëÇÑ ÀÚ¼¼ÇÑ ºÐ¼®À» Á¦½ÃÇÏ°í ÀÖ´Ù. Á¦¾ÈÇÏ´Â ¸ðµ¨À» º¹ÇÕ ¸í»ç¸¦ °í·ÁÇÏ´Â Æò°¡ ¹æ½Ä¿¡ Àû¿ëÇÑ ½ÇÇè °á°ú, 98.33%ÀÇ À½Àý ´ÜÀ§ Á¤È®µµ¿Í 93.06%ÀÇ ¾îÀý ´ÜÀ§ Á¤È®·üÀ» ¾ò¾ú´Ù. |
¿µ¹®³»¿ë (English Abstract) |
Automatic word spacing is a process of deciding correct boundaries between words in a sentence including spacing errors. It is very important to increase the readability and to communicate the accurate meaning of text to the reader. The previous statistical approaches for automatic word spacing do not consider the previous spacing state, and thus can not help estimating inaccurate probabilities. In this paper, we propose two statistical word spacing models which can solve the problem of the previous statistical approaches. The proposed models are based on the observation that the automatic word spacing is regarded as a classification problem such as the POS tagging. The models can consider broader context and estimate more accurate probabilities by generalizing hidden Markov models. We have experimented the proposed models under a wide range of experimental conditions in order to compare them with the current state of the art, and also provided detailed error analysis of our models. The experimental results show that the proposed models have a syllable-unit accuracy of 98.33% and Eojeol-unit precision of 93.06% by the evaluation method considering compound nouns., , |
Å°¿öµå(Keyword) |
ÀÚµ¿ ¶ç¾î¾²±â
È®·ü ¸ðµ¨
Àº´Ð ¸¶¸£ÄÚÇÁ ¸ðµ¨
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|