Advances in Intelligent Systems and Computing, Volume 807, Pages 58-67 , 01/01/2019

Improved term weighting factors for keyword extraction in hierarchical category structure and Thai text classification

Boonthida Chiraratanasopha, Thanaruk Theeramunkong, Salin Boonbrahm

Abstract

Keyword extraction of complex hierarchical categories becomes a challenge in text mining since commonly used classification for flat categories results in low accuracy. This paper presents a method to improve keyword extraction from hierarchical categories considering terms occurred in category from a hierarchy as additional factors in term-weighting. The method is an enhancement of a basic TF-IDF calculation; thus, it can comfortably be used for keyword extraction and classification. By taking term frequency and inverse document frequency of categories hierarchically related to a focused category, we can determine how important terms are in their family categories. In this work, hierarchy relations used in calculation are sub-categories, supercategories and sibling-categories. From experiment results, we found that the proposed method gained higher accuracy for about 40% from a baseline in a classification task.

Document Type

Conference Paper

Source Type

Book Series

ISBN

[9783319947020]

ISSN

21945357

Keywords

Hierarchical categoriesKeyword extractionTerm frequency-inverse documents frequency (TF-IDF)Term weighting

ASJC Subject Area

Engineering : Control and Systems EngineeringComputer Science : Computer Science (all)


Bibliography


Chiraratanasopha, B., Theeramunkong, T., & Boonbrahm, S. (2019). Improved term weighting factors for keyword extraction in hierarchical category structure and Thai text classification. Advances in Intelligent Systems and Computing, 80758-67. doi:10.1007/978-3-319-94703-7_6

Copy | Save