깊은 신경망 기반 대용량 텍스트 데이터 분류 기술

조휘열; 김진화; 김경민; 장정호; 엄재홍; 장병탁; Hwiyeol Jo; Jin-Hwa Kim; Kyung-Min Kim; Jeong-Ho Chang; Jae-Hong Eom; Byoung-Tak Zhang

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

Current Result Document : 9 / 10 이전건 다음건

한글제목(Korean Title)	깊은 신경망 기반 대용량 텍스트 데이터 분류 기술
영문제목(English Title)	Large-Scale Text Classification with Deep Neural Networks
저자(Author)	조휘열 김진화 김경민 장정호 엄재홍 장병탁 Hwiyeol Jo Jin-Hwa Kim Kyung-Min Kim Jeong-Ho Chang Jae-Hong Eom Byoung-Tak Zhang
원문수록처(Citation)	VOL 23 NO. 05 PP. 0322 ~ 0327 (2017. 05)
한글내용 (Korean Abstract)	문서 분류 문제는 오랜 기간 동안 자연어 처리 분야에서 연구되어 왔다. 우리는 기존 컨볼루션 신경망을 이용했던 연구에서 나아가, 순환 신경망에 기반을 둔 문서 분류를 수행하였고 그 결과를 종합하여 제시하려 한다. 컨볼루션 신경망은 단층 컨볼루션 신경망을 사용했으며, 순환 신경망은 가장 성능이 좋다고 알려져 있는 장기-단기 기억 신경망과 회로형 순환 유닛을 활용하였다. 실험 결과, 분류 정확도는 Multinomial Naïve Bayesian Classifier < SVM < LSTM < CNN < GRU의 순서로 나타났다. 따라서 텍스트 문서 분류 문제는 시퀀스를 고려하는 것 보다는 문서의 feature를 추출하여 분류하는 문제에 가깝다는 것을 확인할 수 있었다. 그리고 GRU가 LSTM보다 문서의 feature 추출에 더 적합하다는 것을 알 수 있었으며 적절한 feature와 시퀀스 정보를 함께 활용할 때 가장 성능이 잘 나온다는 것을 확인할 수 있었다.
영문내용 (English Abstract)	The classification problem in the field of Natural Language Processing has been studied for a long time. Continuing forward with our previous research, which classifies large-scale text using Convolutional Neural Networks (CNN), we implemented Recurrent Neural Networks (RNN), Long- Short Term Memory (LSTM) and Gated Recurrent Units (GRU). The experiment’s result revealed that the performance of classification algorithms was Multinomial Naïve Bayesian Classifier < Support Vector Machine (SVM) < LSTM < CNN < GRU, in order. The result can be interpreted as follows: First, the result of CNN was better than LSTM. Therefore, the text classification problem might be related more to feature extraction problem than to natural language understanding problems. Second, judging from the results the GRU showed better performance in feature extraction than LSTM. Finally, the result that the GRU was better than CNN implies that text classification algorithms should consider feature extraction and sequential information. We presented the results of fine-tuning in deep neural networks to provide some intuition regard natural language processing to future researchers.
키워드(Keyword)	딥러닝 대용량 문서 분류 자연어 처리 인공신경망 deep learning large-scale text classification natural language processing artificial neural networks
파일첨부	PDF 다운로드