하둡 기반 대규모 작업 배치 및 처리 기술 설계

김직수; 구엔 카오; 김서영; 황순욱; Jik-Soo Kim; Nguyen Cao; Seoyoung Kim; Soonwook Hwang

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회논문지 (Journal of KIISE)

정보과학회논문지 (Journal of KIISE)

Current Result Document : 2 / 15 이전건 다음건

한글제목(Korean Title)	하둡 기반 대규모 작업 배치 및 처리 기술 설계
영문제목(English Title)	Design of a Large-scale Task Dispatching & Processing System based on Hadoop
저자(Author)	김직수 구엔 카오 김서영 황순욱 Jik-Soo Kim Nguyen Cao Seoyoung Kim Soonwook Hwang
원문수록처(Citation)	VOL 43 NO. 06 PP. 0613 ~ 0620 (2016. 06)
한글내용 (Korean Abstract)	본 논문에서는 대규모의 작업을 고성능으로 처리하기 위한 Many-Task Computing(MTC) 기술을 기존의 빅데이터 처리 플랫폼인 Hadoop에 적용하기 위한 MOHA(Many-Task Computing on Hadoop) 프레임워크에 대해 기술한다. 세부적으로는 MOHA의 기본 개념과 개발 동기, 분산 작업 큐에 기반한 PoC(Proof-of-Concept) 수행 결과를 제시하고 향후 연구 방향에 대해서 논의하고자 한다. MTC응용은 각각의 태스크들이 요구하는 I/O 처리량은 상대적으로 많지 않지만, 동시에 대량의 태스크들을 고성능으로 처리해야하고 이들이 파일을 통해서 통신한다는 특징을 가지고 있다. 따라서 기존의 상대적으로 큰 데이터 블록 사이즈에 기반한 Hadoop 응용과는 또 다른 패턴의 데이터 집약형 워크로드라고 할 수 있다. 이러한 MTC 기술과 빅데이터 기술의 융합을 통해 멀티 응용 플랫폼으로 진화하고 있는 Hadoop 생태계에 신규 프레임워크로서 대규모 계산과학 응용을 실행할 수 있는 MOHA를 추가하여 기여할 수 있을 것이다.
영문내용 (English Abstract)	This paper presents a MOHA(Many-Task Computing on Hadoop) framework which aims to effectively apply the Many-Task Computing(MTC) technologies originally developed for high-performance processing of many tasks, to the existing Big Data processing platform Hadoop. We present basic concepts, motivation, preliminary results of PoC based on distributed message queue, and future research directions of MOHA. MTC applications may have relatively low I/O requirements per task. However, a very large number of tasks should be efficiently processed with potentially heavy inter-communications based on files. Therefore, MTC applications can show another pattern of dataintensive workloads compared to existing Hadoop applications, typically based on relatively large data block sizes. Through an effective convergence of MTC and Big Data technologies, we can introduce a new MOHA framework which can support the large-scale scientific applications along with the Hadoop ecosystem, which is evolving into a multi-application platform.
키워드(Keyword)	Many-Task Computing 하둡 빅데이터 플랫폼 멀티레벨 스케줄링 MOHA Many-Task Computing Hadoop Big Data platform multi-level scheduling MOHA
파일첨부	PDF 다운로드