DramaQA: 계층적 질의응답과 함께하는 등장인물 중심 비디오 스토리 이해

이진우; 원정임; 윤지희; JinWoo Lee; Jung-Im Won; JeeHee Yoon; 최성호; 온경운; 허유정; 장유원; 서아정; 이승찬; 이민수; 장병탁; Seongho Choi; Kyoung-Woon On; Yu-Jung Heo; Youwon Jang; Ahjeong Seo; Seungchan Lee; Minsu Lee; Byoung-Tak Zhang

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

Current Result Document :

한글제목(Korean Title)	DramaQA: 계층적 질의응답과 함께하는 등장인물 중심 비디오 스토리 이해
영문제목(English Title)	DramaQA: Character-Centered Video Story Understanding with Hierarchical QA
저자(Author)	이진우 원정임 윤지희 JinWoo Lee Jung-Im Won JeeHee Yoon 최성호 온경운 허유정 장유원 서아정 이승찬 이민수 장병탁 Seongho Choi Kyoung-Woon On Yu-Jung Heo Youwon Jang Ahjeong Seo Seungchan Lee Minsu Lee Byoung-Tak Zhang
원문수록처(Citation)	VOL 27 NO. 01 PP. 0001 ~ 0007 (2021. 01)
한글내용 (Korean Abstract)	본 논문은 비디오 스토리의 포괄적 이해를 위한 새로운 비디오 질의응답 데이터셋 DramaQA 를 제안한다. DramaQA 데이터셋은 1) 인간지능의 인지 발달 단계에 기초한 인공지능 시스템에 대한 평가 지표로서의 계층적 질의응답 데이터셋과 2) 스토리의 지역적 일관성을 모델링하기 위한 등장인물 중심의 비디오 주석을 제공하는 것을 목표로 한다. DramaQA 데이터셋은 TV 드라마 “또 오해영”을 이용하여 제작되었으며, 23,928개의 다양한 길이의 비디오로부터 각각 4개의 난이도 중 하나에 포함되는 17,983개의 질의응답 쌍을 포함한다. 데이터셋은 등장인물 중심 시각적 주석이 되어있는 217,308장의 이미지들과 상호참조가 해결된 스크립트를 제공한다. 또한, 우리는 비디오 질의응답에 대한 등장인물중심의 표현을 효과적으로 학습하기 위한 Dual Matching Multistream 모델을 제안하고 DramaQA 데이터셋에 적용하여 등장인물 중심의 비디오 스토리 이해 방법을 제시한다.
영문내용 (English Abstract)	In this paper, we propose a novel video question answering (Video QA) task, DramaQA, for obtaining a comprehensive understanding of a video story. The DramaQA focuses on two perspectives: 1) hierarchical QAs as an evaluation metric based on the cognitive developmental stages of human intelligence, and 2) character-centered video annotations to model the local coherence of the story. Our dataset is built upon the TV drama “Another Miss Oh” and contains 16,191 QA pairs from 23,928 video clips of various lengths, with each QA pair belonging to one of four difficulty levels. We provide a total of 217,308 annotated images with rich character-centered visual annotations and coreference resolved scripts. In addition, we provide analyses of the dataset as well as a Dual Matching Multistream model which effectively learns character-centered representations of the video to answer questions about the video.
키워드(Keyword)	차세대 시퀀싱 변이 분석 Genome Variant Call Format(GVCF) 파일 소트/머지 스파크 분 산병렬처리 next-generation sequencing (NGS) variant analysis Genome Variant Call Format(GVCF) File Sort/Merge Spark parallel/distributed computing 비디오 질의응답 비디오 스토리 이해 질의응답 평가지표 등장인물 중심 비디오 주석 video question and answering video story understanding evaluation metric for QA character-centered video annotation
파일첨부	PDF 다운로드