• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö > Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö ÄÄÇ»ÅÍ ¹× Åë½Å½Ã½ºÅÛ

Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö ÄÄÇ»ÅÍ ¹× Åë½Å½Ã½ºÅÛ

Current Result Document : 2 / 3 ÀÌÀü°Ç ÀÌÀü°Ç   ´ÙÀ½°Ç ´ÙÀ½°Ç

ÇѱÛÁ¦¸ñ(Korean Title) ¾ÆÆÄÄ¡ ½ºÆÄÅ© È°¿ë ±Ø´ëÈ­¸¦ À§ÇÑ ¼º´É ÃÖÀûÈ­ ±â¹ý
¿µ¹®Á¦¸ñ(English Title) Performance Optimization Strategies for Fully Utilizing Apache Spark
ÀúÀÚ(Author) ¸í³ë¿µ   À¯Çåâ   ÃÖ¼ö°æ   Rohyoung Myung   Heonchang Yu   Sukyong Choi  
¿ø¹®¼ö·Ïó(Citation) VOL 07 NO. 01 PP. 0009 ~ 0018 (2018. 01)
Çѱ۳»¿ë
(Korean Abstract)
ºÐ»ê ó¸® Ç÷§Æû¿¡¼­ ´Ù¾çÇÑ ºò µ¥ÀÌÅÍ Ã³¸® ¾îÇø®ÄÉÀ̼ǵéÀÇ ¼öÇà ¼º´É Çâ»ó¿¡ ´ëÇÑ °ü½ÉÀÌ ³ô¾ÆÁö°í ÀÖ´Ù. ÀÌ¿¡ µû¶ó ¹ü¿ëÀûÀÎ ºÐ»ê ó¸® Ç÷§ÆûÀÎ ¾ÆÆÄÄ¡ ½ºÆÄÅ©¿¡¼­ ¾îÇø®ÄÉÀ̼ǵéÀÇ Ã³¸® ¼º´É ÃÖÀûÈ­¿¡ ´ëÇÑ ¿¬±¸µéÀÌ È°¹ßÇÏ°Ô ÁøÇàµÇ°í ÀÖ´Ù. ½ºÆÄÅ©¿¡¼­ µ¥ÀÌÅÍ Ã³¸® ¾îÇø®ÄÉÀ̼ǵéÀÇ ¼öÇà ¼º´ÉÀ» Çâ»ó½ÃÅ°±â À§Çؼ­´Â ½ºÆÄÅ©ÀÇ ºÐ»ê󸮸ðµ¨ÀÎ Directed Acyclic Graph(DAG)¿¡ ¾Ë¸ÂÀº ÇüÅ·Π¾îÇø®ÄÉÀ̼ÇÀ» ÃÖÀûÈ­½ÃÄÑ¾ß ÇÏ°í ¾îÇø®ÄÉÀ̼ÇÀÇ Ã³¸® Ư¡À» °í·ÁÇÏ¿© ½ºÆÄÅ© ½Ã½ºÅÛ ÆĶó¹ÌÅ͵éÀ» ¼³Á¤ÇØ¾ß Çϱ⠶§¹®¿¡ ¸Å¿ì ¾î·Æ´Ù. ±âÁ¸ ¿¬±¸µéÀº °¢°¢ÀÇ ¾îÇø®ÄÉÀ̼ÇÀÇ Ã³¸® ¼º´É¿¡ ¿µÇâÀ» ÁÖ´Â ÇϳªÀÇ ¿ä¼Ò¿¡ ´ëÇÑ ºÎºÐÀûÀÎ ¿¬±¸¸¦ ¼öÇàÇß°í, ÃÖÁ¾ÀûÀ¸·Î ¾îÇø®ÄÉÀ̼ÇÀÇ ¼º´É°³¼±À» ÀÌ·ï³ÂÁö¸¸ ½ºÆÄÅ©ÀÇ Àü¹ÝÀûÀΠ󸮰úÁ¤À» °í·ÁÇÑ ¼º´É ÃÖÀûÈ­¸¦ ´Ù·çÁö ¾Ê¾ÒÀ» »Ó¸¸ ¾Æ´Ï¶ó 󸮼º´É°ú »ó°ü°ü°è¸¦ °®´Â ´Ù¾çÇÑ ¿ä¼ÒµéÀÇ º¹ÇÕÀûÀÎ »óÈ£ÀÛ¿ëÀ» °í·ÁÇÏÁö ¸øÇß´Ù. º» ¿¬±¸¿¡¼­´Â ½ºÆÄÅ©¿¡¼­ ÀϹÝÀûÀÎ µ¥ÀÌÅÍ Ã³¸® ¾îÇø®ÄÉÀ̼ÇÀÇ ¼öÇà °úÁ¤À» ºÐ¼®ÇÏ°í, ºÐ¼®µÈ °á°ú¸¦ Åä´ë·Î ¾îÇø®ÄÉÀ̼ÇÀÇ Ã³¸®°úÁ¤ Áß ½ºÅ×ÀÌÁö ³»ºÎ¿Í ½ºÅ×ÀÌÁö »çÀÌ¿¡¼­ ¼º´É Çâ»óÀ» À§ÇÑ Ã³¸® Àü·«À» Á¦¾ÈÇÑ´Ù. ¶ÇÇÑ ½ºÆÄÅ©ÀÇ ½Ã½ºÅÛ ¼³Á¤ ÆĶó¹ÌÅÍ Áß ºÐ»ê º´·Äó¸®¿Í ¹ÐÁ¢ÇÑ °ü°è¸¦ °®´Â ÆÄƼ¼Ç º´·ÄÈ­¿¡ µû¸¥ ¾îÇø®ÄÉÀ̼ÇÀÇ ¼öÇ༺´ÉÀ» ºÐ¼®ÇÏ°í ÀûÇÕÇÑ ÆÄƼ¼Å´× ÃÖÀûÈ­ ±â¹ýÀ» Á¦¾ÈÇÑ´Ù. 3°¡Áö ¼º´É Çâ»ó Àü·«ÀÇ ½ÇÈ¿¼ºÀ» ÀÔÁõÇϱâ À§ÇØ ÀϹÝÀûÀÎ µ¥ÀÌÅÍ Ã³¸® ¾îÇø®ÄÉÀ̼Ç: WordCount, Pagerank, Kmeans¿¡ °¢°¢ÀÇ ¹æ¹ýÀ» »ç¿ëÇßÀ» ¶§ÀÇ ¼º´É Çâ»ó·üÀ» Á¦½ÃÇÑ´Ù. ¶ÇÇÑ Á¦¾ÈÇÑ 3°¡Áö ¼º´É ÃÖÀûÈ­ ±â¹ýµéÀÌ ÇÔ²² Àû¿ëµÉ ¶§ º¹ÇÕÀûÀÎ ¼º´ÉÇâ»ó ½Ã³ÊÁö¸¦ ³»´ÂÁö¸¦ È®ÀÎÇϱâ À§ÇØ ¸ðµç ±â¹ýµéÀÌ Àû¿ëµÆÀ» ¶§ÀÇ ¼º´É Çâ»ó·üÀ» Á¦½ÃÇÔÀ¸·Î½á º» ¿¬±¸¿¡¼­ Á¦½ÃÇÏ´Â Àü·«µéÀÇ ½ÇÈ¿¼ºÀ» ÀÔÁõÇÑ´Ù.
¿µ¹®³»¿ë
(English Abstract)
Enhancing performance of big data analytics in distributed environment has been issued because most of the big data related applications such as machine learning techniques and streaming services generally utilize distributed computing frameworks. Thus, optimizing performance of those applications at Spark has been actively researched. Since optimizing performance of the applications at distributed environment is challenging because it not only needs optimizing the applications themselves but also requires tuning of the distributed system configuration parameters. Although prior researches made a huge effort to improve execution performance, most of them only focused on one of three performance optimization aspect: application design, system tuning, hardware utilization. Thus, they couldn¡¯t handle an orchestration of those aspects. In this paper, we deeply analyze and model the application processing procedure of the Spark. Through the analyzed results, we propose performance optimization schemes for each step of the procedure: inner stage and outer stage. We also propose appropriate partitioning mechanism by analyzing relationship between partitioning parallelism and performance of the applications. We applied those three performance optimization schemes to WordCount, Pagerank, and Kmeans which are basic big data analytics and found nearly 50% performance improvement when all of those schemes are applied.
Å°¿öµå(Keyword) ¾ÆÆÄÄ¡ ½ºÆÄÅ©   ¼º´É ÃÖÀûÈ­   ½Ã½ºÅÛ Æ©´×   Apache Spark   Performance Optimization   System Tuning  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå