Bulletin of Electrical Engineering and Informatics, Volume 14, Issue 4, Pages 3091-3100 , 01/08/2025

Solving missing categorical data in questionnaire responses for automated classification

Saifon Aekwarangkoon, Thanatep Namponwatthanakul, Adisorn Amonwet, Siranuch Hemtanon

Abstract

Handling missing categorical data is critical for maintaining the accuracy and reliability of automatic classification tasks, particularly in mental health screening based on questionnaire responses. This study investigates several imputation methods, including last observation carried forward (LOCF), k-nearest neighbor (KNN) imputation, hot-deck imputation, and multivariate imputation by chained equations (MICE). Results show that KNN imputation achieves the lowest root mean square error (RMSE), indicating the most faithful reconstruction of the original data. However, for classification performance, MICE-imputed datasets produced models that outperformed those generated by other methods and even surpassed models trained on the original incomplete data. Interestingly, we also found that using observed data over multiple iterations of imputation tuning can introduce greater deviation from original missing values, but this process can help form datasets with clearer class boundaries, ultimately improving classification accuracy. These findings emphasize the need to balance data fidelity and model performance when selecting imputation strategies.

Document Type

Article

Source Type

Journal

Keywords

Classification performanceData imputationMental health screeningMissing categorical dataQuestionnaire responses

ASJC Subject Area

Computer Science : Computer Networks and CommunicationsComputer Science : Hardware and ArchitectureComputer Science : Information SystemsEngineering : Control and Systems EngineeringEngineering : Electrical and Electronic EngineeringMathematics : Control and OptimizationPhysics and Astronomy : InstrumentationComputer Science : Computer Science (miscellaneous)

Funding Agency

Thai Health Promotion Foundation



0
Citations (Scopus)

Bibliography


Aekwarangkoon, S., Namponwatthanakul, T., Amonwet, A., & Hemtanon, S. (2025). Solving missing categorical data in questionnaire responses for automated classification. Bulletin of Electrical Engineering and Informatics, 14(4) 3091-3100. doi:10.11591/eei.v14i4.8785

Copy | Save