Á¤º¸°úÇÐȸ ³í¹®Áö B : ¼ÒÇÁÆ®¿þ¾î ¹× ÀÀ¿ë
ÇѱÛÁ¦¸ñ(Korean Title) |
À¯»ç¾î º¤ÅÍ È®ÀåÀ» ÅëÇÑ XMLű×ÀÇ À¯»ç¼º °Ë»ç |
¿µ¹®Á¦¸ñ(English Title) |
Similarity checking between XML tags through expanding synonym vector |
ÀúÀÚ(Author) |
ÀÌÁ¤¿ø
ÀÌÇý¼ö
À̱âÈ£
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 29 NO. 09 PP. 0676 ~ 0683 (2002. 10) |
Çѱ۳»¿ë (Korean Abstract) |
XML(eXtensible Markup Language)¹®¼°¡ À¥ ¹®¼ÀÇ Ç¥ÁØÀ¸·Î ÀÚ¸® ¸Å±è ÇÒ ¼ö ÀÖ´Â °¡Àå Å« ¼º°ø¿äÀÎÀº »ç¿ëÀÚ°¡ ¹®¼ ŸÀÔÀ» ±â¼úÇÒ ¼ö ÀÖ´Â À¯¿¬¼º(flexibility)ÀÌ´Ù. ±×·¯³ª XMLÀÇ À¯¿¬¼ºÀ¸·Î ¾ß±âµÇ´Â ¹®Á¦Á¡Àº µ¿ÀÏÇÑ Àǹ̸¦ Ç¥ÇöÇϱâ À§ÇØ XML¹®¼ ÀÛ¼ºÀÚ¸¶´Ù ¼·Î ´Ù¸¥ ű׸í°ú ±¸Á¶¸¦ »ç¿ëÇÑ´Ù´Â Á¡ÀÌ´Ù. Áï ¼·Î ´Ù¸¥ ÅÂ±× ÁýÇÕ, ¿ä¼Ò(element), ¼Ó¼º(attribute)¿¡ ´ëÇÑ ¼·Î ´Ù¸¥ À̸§ ¶Ç´Â ´Ù¸¥ ¹®¼ ±¸Á¶·Î ÀÎÇØ ´Ù¸¥ ű׷ΠǥÇöµÈ ¹®¼´Â ¼·Î ´Ù¸¥ ºÎ·ùÀÇ ¹®¼·Î °£ÁֵDZ⠽±´Ù. µû¶ó¼ º» ³í¹®Àº XMLű׿¡ ³»ÀçµÈ ÀÇ¹Ì Á¤º¸(semantic information)¿Í ±¸Á¶ Á¤º¸(structured information)¸¦ ÃßÃâÇÏ¿© ÀǹÌÀûÀ¸·Î ÃÖ´ëÇÑ À¯»çÇÑ µ¿ÀǾî·Î È®ÀåÇÏ°í, XML¹®¼ÀÇ È®ÀåµÈ űװ£ÀÇ ÀǹÌÀû À¯»çµµ¸¦ ºñ±³ ºÐ¼®ÇÒ ¼ö ÀÖ´Â °³³ä ±â¹ÝÀÇ ÅÂ±× ÆÐÅÏ ¸Åó(Tag Pattern Matcher)¸¦ ¼³°è ±¸ÇöÇÏ¿´´Ù. µÎ XML¹®¼ÀÇ Å±װ£ÀÇ ÀǹÌÀû À¯»çµµ¿¡ °¡ÁßÄ¡¸¦ ºÎ¿©ÇÏ¿© ±âÁ¸ÀÇ ºñ±¸Á¶ÀûÀÎ(semi-structured) ¹®¼¸¦ À§ÇÑ º¤ÅÍ ½ºÆäÀ̽º ¸ðµ¨(vector space model)À» È®ÀåÇÔÀ¸·Î½á µÎ XML¹®¼°¡ À¯»çÇÑÁö¸¦ ÆľÇÇÒ ¼ö ÀÖ´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
The success of XML(eXtensible Markup Language) is primarily based on its flexibility : everybody can define the structure of XML documents that represent information in the form he or she desires. XML is so flexible that XML documents cannot be automatically provided with an underlying semantics. Different tag sets, different names for elements or attributes, or different document structures in general mislead the task of classifying and clustering XML documents precisely. In this paper, we design and implement a system that allows checking the semantic-based similarity between XML tags. First, this system extracts the underlying semantics of tags and then expands the synonym set of tags using an WordNet thesaurus and user-defined word library which supports the abbreviation forms and compound words for XML tags. Seconds, considering the relative importance of XML tags in the XML documents, we extend a conventional vector space model which is the most generally used for document model in Information Retrieval field. Using this method, we have been able to check the similarity between XML tags which are represented different tags.
|
Å°¿öµå(Keyword) |
XML
Á¤º¸ °Ë»ö
¹®¼ ó¸®
¹®¼ ºÐ¼®
Information Retrieval
Document Processing
Document Analysis
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|