Information Switzerland, Volume 11, Issue 11, Pages 1-13 , 01/11/2020

Random forest with sampling techniques for handling imbalanced prediction of university student depression

Siriporn Sawangarreerak, Putthiporn Thanathamathee

Abstract

In this work, we propose a combined sampling technique to improve the performance of imbalanced classification of university student depression data. In experimental results, we found that combined random oversampling with the Tomek links under sampling methods allowed generating a relatively balanced depression dataset without losing significant information. In this case, the random oversampling technique was used for sampling the minority class to balance the number of samples between the datasets. Then, the Tomek links technique was used for undersampling the samples by removing the depression data considered less relevant and noisy. The relatively balanced dataset was classified by random forest. The results show that the overall accuracy in the prediction of adolescent depression data was 94.17%, outperforming the individual sampling technique. Moreover, our proposed method was tested with another dataset for its external validity. This dataset’s predictive accuracy was found to be 93.33%.

Document Type

Article

Source Type

Journal

Keywords

Depression predictionFeature selectionImbalanced dataPatient Health Questionnaire-9 (PHQ-9)Sampling techniques

ASJC Subject Area

Computer Science : Information Systems

Funding Agency

Walailak University


Bibliography


Sawangarreerak, S., & Thanathamathee, P. (2020). Random forest with sampling techniques for handling imbalanced prediction of university student depression. Information Switzerland, 11(11) 1-13. doi:10.3390/info11110519

Copy | Save