행 기반 저장소에서 컬럼 접근 최적화

홍대용; 이상원; Dae-Yong Hong; Sang-Won Lee

연구문헌

국내 학회지

홈 > 연구문헌 > 국내 학회지 > 데이터베이스 연구회지(SIGDB)

데이터베이스 연구회지(SIGDB)

Current Result Document :

한글제목(Korean Title)	행 기반 저장소에서 컬럼 접근 최적화
영문제목(English Title)	Optimizing Column Accesses in Row-Store
저자(Author)	홍대용 이상원 Dae-Yong Hong Sang-Won Lee
원문수록처(Citation)	VOL 29 NO. 02 PP. 0003 ~ 0014 (2013. 08)
한글내용 (Korean Abstract)	전통적인 행 기반 데이터베이스에서 특정 컬럼에 접근하기 위해서는 컬럼의 위치를 나타내는 오프셋을 구해야 한다. Oracle이나 PostgreSQL과 같은 DBMS는 질의에 의해 요청된 컬럼의 오프셋을 구하기 위하여 선행하는 컬럼 들의 길이를 더한다. 이때 선행하는 컬럼들의 길이는 컬럼의 헤더로부터 구할 수 있다. 이러한 방식은 적은 수의컬럼에 접근할 때에도 필요한 컬럼 뿐 아니라 다른 컬럼에 접근하면서 많은 CPU 명령어 수행을 필요로 하며, 튜플 길이가 긴 경우 불필요한 데이터 캐시 미스를 유발한다. 본 논문에서는 행 기반 저장소에서의 컬럼 접근 최적화 기법을 제안하고, 이를 오픈소스 DBMS PostgreSQL를 사용해서 구현하였다. 구체적으로, 컬럼 접근 최적화를 위해 각 컬럼들의 오프셋 위치를 미리 계산한 후, 튜플 헤더 부분에 배열 구조로 위치시키고 컬럼 접근 시 오프셋 값을 이용해서 해당 컬럼 데이터를 바로 접근하게 하였다. TPC-H 벤치마크의 특정 테이블을 사용한 실험 결과, 제안 기법은 기존 방식에 비해 최대 50%까지 성능 개선을 확인하였다. 이러한 성능 향상은 컬럼 오프셋 위치를 구하는데 필요한 CPU 명령어 감소와, 메모리 접근 횟수를 줄임으로써 캐시 효율이 증가하는 것에 기인한다. 제안한 방식은 단순한 계산을 위하여 테이블의 크기를 증가시키지만 탐색중인 페이지가 CPU에 캐시 되지 않은 경우 캐시 효율 증가를 기대할 수 있다. 따라서 특정 데이터가 자주 사용되는 환경이나 메인 메모리 DB에서 효과적일 것이다.
영문내용 (English Abstract)	In order to access a column value in traditional row-store DBMSs, its starting position should be identified. Even in most modern row-wise DBMS such as Oracle and PostgreSQL, all the preceding columns in a tuple to which a column under access belongs need to be scanned in order to calculate the offset of the column. With this approach, there exist two performance problems, especially when columns to be accessed are physically located at the rear part of long tuples: 1) many CPU instructions are required to calculate the offset and 2) excessive CPU cache misses are encountered from accessing other columns irrelevant to the query. In this paper, we propose an optimization technique for accessing columns in row-store DBMSs and describe our implementation using an open source DBMS, PostgreSQL. Specifically, in our scheme, the offset of each column is pre-calculated and stored in an array of column offset in the tuple header, and from the offset information, a column data can be accessed only by obtaining its offset from the array. According to our simple experiment using a table from TPC-H benchmark, our scheme can outperform the existing one by up to 50 percent. And this remarkable performance improvement can be explained by the reduced CPU instructions used in calculating the column offset, and increased cache efficiency by reducing main memory accesses while scanning the tuple. To make the calculation simpler, proposed structure would increase size of the table, but improves cache efficiency when scanned pages are not cached in CPU. Therefore, it will be effective to systems that queries are intensively executed on specific data or main memory DBMS.
키워드(Keyword)	의 처리 컬럼 캐시 캐시 미스 분기 예측 Query Processing Column Cache Cache Miss Branch Prediction
파일첨부	PDF 다운로드