• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö > Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö ¼ÒÇÁÆ®¿þ¾î ¹× µ¥ÀÌÅÍ °øÇÐ

Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö ¼ÒÇÁÆ®¿þ¾î ¹× µ¥ÀÌÅÍ °øÇÐ

Current Result Document : 3 / 5 ÀÌÀü°Ç ÀÌÀü°Ç   ´ÙÀ½°Ç ´ÙÀ½°Ç

ÇѱÛÁ¦¸ñ(Korean Title) CTC¸¦ Àû¿ëÇÑ CRNN ±â¹Ý Çѱ¹¾î À½¼ÒÀÎ½Ä ¸ðµ¨ ¿¬±¸
¿µ¹®Á¦¸ñ(English Title) CRNN-Based Korean Phoneme Recognition Model with CTC Algorithm
ÀúÀÚ(Author) È«À±¼®   ±â°æ¼­   ±Ç°¡Áø   Hong Yoonseok   Ki Kyungseo   Gweon Gahgene  
¿ø¹®¼ö·Ïó(Citation) VOL 08 NO. 03 PP. 0115 ~ 0122 (2019. 03)
Çѱ۳»¿ë
(Korean Abstract)
Áö±Ý±îÁöÀÇ Çѱ¹¾î À½¼Ò ÀνĿ¡´Â Àº´Ð ¸¶¸£ÄÚÇÁ-°¡¿ì½Ã¾È ¹Í½ºÃÄ ¸ðµ¨(HMM-GMM)À̳ª Àΰø½Å°æ¸Á-HMMÀ» °áÇÕÇÑ ÇÏÀ̺긮µå ½Ã½ºÅÛÀÌ ÁÖ·Î »ç¿ëµÇ¾î ¿Ô´Ù. ÇÏÁö¸¸ ÀÌ ¹æ¹ýÀº ¼º´É °³¼± ¿©Áö°¡ ÀûÀ¸¸ç, Àü¹®°¡¿¡ ÀÇÇØ Á¦ÀÛµÈ °­Á¦Á¤·Ä(force-alignment) ÄÚÆÛ½º ¾øÀÌ´Â ÇнÀÀÌ ºÒ°¡´ÉÇÏ´Ù´Â ´ÜÁ¡ÀÌ ÀÖ´Ù. ÀÌ ¸ðµ¨ÀÇ ¹®Á¦·Î ÀÎÇØ Å¸ ¾ð¾î¸¦ ´ë»óÀ¸·Î ÇÑ À½¼Ò ÀÎ½Ä ¿¬±¸¿¡¼­´Â ÀÌ ´ÜÁ¡À» º¸¿ÏÇϱâ À§ÇØ ¼øȯ ½Å°æ¸Á(RNN)°è¿­ ±¸Á¶¿Í Connectionist Temporal Classification(CTC) ¾Ë°í¸®ÁòÀ» °áÇÕÇÑ ½Å°æ¸Á ±â¹Ý À½¼Ò ÀÎ½Ä ¸ðµ¨ÀÌ ¿¬±¸µÈ ¹Ù ÀÖ´Ù. ±×·¯³ª RNN °è¿­ ¸ðµ¨À» ÇнÀ½ÃÅ°±â À§ÇØ ¸¹Àº À½¼º ¸»¹¶Ä¡°¡ ÇÊ¿äÇÏ°í ±¸Á¶°¡ º¹ÀâÇØÁú °æ¿ì ÇнÀÀÌ ±î´Ù·Î¿ö, Á¤Á¦µÈ ¸»¹¶Ä¡°¡ ºÎÁ·ÇÏ°í ±â¹Ý ¿¬±¸°¡ ºñ±³Àû ºÎÁ·ÇÑ Çѱ¹¾îÀÇ °æ¿ì »ç¿ë¿¡ Á¦¾àÀÌ ÀÖ¾ú´Ù. ÀÌ¿¡ º» ¿¬±¸´Â °­Á¦Á¤·ÄÀÌ ºÒÇÊ¿äÇÑ CTC ¾Ë°í¸®ÁòÀ» µµÀÔÇϵÇ, RNN¿¡ ºñÇØ ´õ ÇнÀ ¼Óµµ°¡ ºü¸£°í ´õ ÀûÀº ¸»¹¶Ä¡·Îµµ ÇнÀÀÌ °¡´ÉÇÑ ÇÕ¼º°ö ½Å°æ¸Á(CNN)À» ±â¹ÝÀ¸·Î Çѱ¹¾î À½¼Ò ÀÎ½Ä ¸ðµ¨À» ±¸ÃàÇÏ¿© º¸°íÀÚ ½ÃµµÇÏ¿´´Ù. ÃÑ 2°¡ÁöÀÇ ºñ±³ ½ÇÇèÀ» ÅëÇØ º» ¿¬±¸¿¡¼­´Â Çѱ¹¾î¿¡ Á¸ÀçÇÏ´Â 49°¡ÁöÀÇ À½¼Ò¸¦ ÆǺ°ÇÏ´Â À½¼Ò Àνı⠸ðµ¨À» Á¦ÀÛÇÏ¿´À¸¸ç, ½ÇÇè °á°ú ÃÖÁ¾ÀûÀ¸·Î ¼±Á¤µÈ À½¼Ò ÀÎ½Ä ¸ðµ¨Àº CNN°ú 3ÃþÀÇ Bidirectional LSTMÀ» °áÇÕÇÑ ±¸Á¶·Î, ÀÌ ¸ðµ¨ÀÇ ÃÖÁ¾ PER(Phoneme Error Rate)Àº 3.26À¸·Î ³ªÅ¸³µ´Ù. ÀÌ´Â Çѱ¹¾î À½¼Ò ÀÎ½Ä ºÐ¾ß¿¡¼­ º¸°íµÈ ±âÁ¸ ¼±Çà ¿¬±¸µéÀÇ PERÀÎ 10~12¿Í ºñ±³ÇÏ¸é »ó´çÇÑ ¼º´É Çâ»óÀ̶ó°í ÇÒ ¼ö ÀÖ´Ù.
¿µ¹®³»¿ë
(English Abstract)
For Korean phoneme recognition, Hidden Markov-Gaussian Mixture model(HMM-GMM) or hybrid models which combine artificial neural network with HMM have been mainly used. However, current approach has limitations in that such models require force-aligned corpus training data that is manually annotated by experts. Recently, researchers used neural network based phoneme recognition model which combines recurrent neural network(RNN)-based structure with connectionist temporal classification(CTC) algorithm to overcome the problem of obtaining manually annotated training data. Yet, in terms of implementation, these RNN-based models have another difficulty in that the amount of data gets larger as the structure gets more sophisticated. This problem of large data size is particularly problematic in the Korean language, which lacks refined corpora. In this study, we introduce CTC algorithm that does not require force-alignment to create a Korean phoneme recognition model. Specifically, the phoneme recognition model is based on convolutional neural network(CNN) which requires relatively small amount of data and can be trained faster when compared to RNN based models. We present the results from two different experiments and a resulting best performing phoneme recognition model which distinguishes 49 Korean phonemes. The best performing phoneme recognition model combines CNN with 3hop Bidirectional LSTM with the final Phoneme Error Rate(PER) at 3.26. The PER is a considerable improvement compared to existing Korean phoneme recognition models that report PER ranging from 10 to 12.
Å°¿öµå(Keyword) À½¼Ò ÀνĠ  CTC ¾Ë°í¸®Áò   ÇÕ¼º°ö ½Å°æ¸Á   ¼øȯ ½Å°æ¸Á   Phoneme Recognition   CTC Algorithm   Convolutional Neural Network   Recurrent Neural Network  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå