빅데이터의 반복적인 연산 작업을 지원하기 위한 Hadoop 기반 순환처리 시스템

홍승태; 윤민; 박경석; 임채덕; 장재우; Seungtae Hong; n Yoon; Kyongseok Park; Chae Deok Lim; Jae-Woo Chang

연구문헌

국내 학회지

홈 > 연구문헌 > 국내 학회지 > 데이터베이스 연구회지(SIGDB)

데이터베이스 연구회지(SIGDB)

Current Result Document : 2 / 8 이전건 다음건

한글제목(Korean Title)	빅데이터의 반복적인 연산 작업을 지원하기 위한 Hadoop 기반 순환처리 시스템
영문제목(English Title)	Hadoop-based Iterative Processing System for Repetitive Computations of Big data
저자(Author)	홍승태 윤민 박경석 임채덕 장재우 Seungtae Hong n Yoon Kyongseok Park Chae Deok Lim Jae-Woo Chang
원문수록처(Citation)	VOL 32 NO. 01 PP. 0013 ~ 0030 (2016. 04)
한글내용 (Korean Abstract)	최근 빅데이터의 효율적인 분석을 위하여, 대표적인 MapReduce 프레임워크인 Hadoop에 대한 연구가 활발히 이루어지고 있다. 한편, 유전체 데이터 분석과 같이, 대부분의 빅데이터 분석 응용은 동일한 Map과 Reduce 함수의 반복적인 수행을 요구한다. 그러나 Hadoop은 비순환처리 구조를 가지고 있기 때문에, 순환처리 응용에 비효율적인 문제점이 존재한다. 따라서 본 논문에서는 빅데이터의 반복적인 연산 작업을 지원하기 위한 Hadoop 기반 순환처리 시스템을 제안한다. 제안하는 시스템은, 첫째, 반복적인 MapReduce job을 관리하기 위해 순환처리 job 스케줄링 기법을 제안한다. 둘째, 데이터 입출력 비용을 감소시키기 위해 불변 데이터 캐싱 기법을 제안한다. 셋째, 불필요한 연산을 방지하기 위해 종료조건 검사 기법을 제안한다. 넷째, Hadoop클러스터 자원의 효율적인 관리를 위해 순환처리 자원 관리 기법을 제안한다. 마지막으로, 기존 하둡 기반 시스템과의 비교를 통해 제안하는 시스템의 성능 우수성을 보인다.
영문내용 (English Abstract)	Recently, in order to analyze big data efficiently, researches on Hadoop, one of the most popular MapReduce framework, have been actively done. Meanwhile, most of the big data analysis applications, e.g., genome data analysis, are required to do the same Map and Reduce functions repeatedly. However, Hadoop is inefficient for iterative data processing applications because it has a non-iterative processing structure. To solve this problem, we, in this paper, propose a Hadoop-based iterative processing system for supporting the repetitive computations of big data. In the proposed system, we first propose an iterative job scheduling technique for managing the iterative MapReduce jobs. Secondly, we propose an invariant data caching mechanism for reducing the I/O costs of data accesses. Thirdly, we propose a stopping condition check mechanism for preventing unnecessary computation. Fourthly, we propose an iterative resource scheduling technique for efficiently managing the resources of a Hadoop cluster. Finally, we show the performance superiority of the proposed system by comparing it with the existing Hadoop-based systems.
키워드(Keyword)	빅데이터 Hadoop 순환처리 클라우드 컴퓨팅 Big data Hadoop Iterative data processing Cloud computing
파일첨부	PDF 다운로드