Advances in Intelligent Systems and Computing, Volume 807, Pages 58-67 , 01/01/2019
Improved term weighting factors for keyword extraction in hierarchical category structure and Thai text classification
Abstract
Keyword extraction of complex hierarchical categories becomes a challenge in text mining since commonly used classification for flat categories results in low accuracy. This paper presents a method to improve keyword extraction from hierarchical categories considering terms occurred in category from a hierarchy as additional factors in term-weighting. The method is an enhancement of a basic TF-IDF calculation; thus, it can comfortably be used for keyword extraction and classification. By taking term frequency and inverse document frequency of categories hierarchically related to a focused category, we can determine how important terms are in their family categories. In this work, hierarchy relations used in calculation are sub-categories, supercategories and sibling-categories. From experiment results, we found that the proposed method gained higher accuracy for about 40% from a baseline in a classification task.
Document Type
Conference Paper
Source Type
Book Series
ISBN
[9783319947020]
ISSN
21945357
Keywords
Hierarchical categoriesKeyword extractionTerm frequency-inverse documents frequency (TF-IDF)Term weighting
ASJC Subject Area
Engineering : Control and Systems EngineeringComputer Science : Computer Science (all)