대화 영상 생성을 위한 한국어 감정음성 및 얼굴 표정 데이터베이스

백지영; 김세라; 이석필; Jiyoung Baek; Sera Kim; Seokpil Lee

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국인터넷정보학회 논문지

한국인터넷정보학회 논문지

Current Result Document :

한글제목(Korean Title)	대화 영상 생성을 위한 한국어 감정음성 및 얼굴 표정 데이터베이스
영문제목(English Title)	Korean Emotional Speech and Facial Expression Database for Emotional Audio-Visual Speech Generation
저자(Author)	백지영 김세라 이석필 Jiyoung Baek Sera Kim Seokpil Lee
원문수록처(Citation)	VOL 23 NO. 02 PP. 0071 ~ 0077 (2022. 04)
한글내용 (Korean Abstract)	본 연구에서는 음성 합성 모델을 감정에 따라 음성을 합성하는 모델로 확장하고 감정에 따른 얼굴 표정을 생성하기 위한 데이터베이스를 수집한다. 데이터베이스는 남성과 여성의 데이터가 구분되며 감정이 담긴 발화와 얼굴 표정으로 구성되어 있다. 성별이 다른 2명의 전문 연기자가 한국어로 문장을 발음한다. 각 문장은 anger, happiness, neutrality, sadness의 4가지 감정으로 구분된다. 각 연기자들은 한 가지의 감정 당 약 3300개의 문장을 연기한다. 이를 촬영하여 수집한 전체 26468개의 문장은 중복되지 않으며 해당하는 감정과 유사한 내용을 담고 있다. 양질의 데이터베이스를 구축하는 것이 향후 연구의 성능에 중요한 역할을 하므로 데이터베이스를 감정의 범주, 강도, 진정성의 3가지 항목에 대해 평가한다. 데이터의 종류에 따른 정확도를 알아보기 위해 구축된 데이터베이스를 음성-영상 데이터, 음성 데이터, 영상 데이터로 나누어 평가를 진행하고 비교한다.
영문내용 (English Abstract)	In this paper, a database is collected for extending the speech synthesis model to a model that synthesizes speech according to emotions and generating facial expressions. The database is divided into male and female data, and consists of emotional speech and facial expressions. Two professional actors of different genders speak sentences in Korean. Sentences are divided into four emotions: happiness, sadness, anger, and neutrality. Each actor plays about 3300 sentences per emotion. A total of 26468 sentences collected by filming this are not overlap and contain expression similar to the corresponding emotion. Since building a high-quality database is important for the performance of future research, the database is assessed on emotional category, intensity, and genuineness. In order to find out the accuracy according to the modality of data, the database is divided into audio-video data, audio data, and video data.
키워드(Keyword)	음성합성 감정음성 데이터베이스 멀티모달 Speech Synthesis Speech Emotion database Multi Modal
파일첨부	PDF 다운로드