Advances in Science Technology and Engineering Systems, Volume 5, Issue 5, Pages 414-425 , 01/01/2020
A hybrid model for coronary heart disease prediction in thai population
Abstract
The ability to verify the critical risk factors related to an effective diagnosis is very crucial for improving accuracy on coronary heart disease prediction. The objective of this research is to find the best predictive model for coronary heart disease diagnosis. Three approaches are set up to achieve the goals (1) investigating the classifier algorithms that are most suitable for the Thai heart disease dataset in this study (2) exploring features analyzed to be the significant risk factors in the predictive model, both major risk factors, and socioeconomic status and (3) rediscretizing the predefined clinical values on certain major risk factors. In order to achieving the optimal model before incorporating with feature selection process, several classifier approaches are conducted in this experiment. The study shows that the most effective classifiers ranked from the highest accuracy are Support Vector Machine, Naïve Bayes, Decision Tree, and Multi Layer Perceptron. Support Vector Machine produces the highest accuracy of 88.18%, with respect to both major risk factors and socioeconomic factors. Moreover, when adjusted thirteen major risk factors and five socioeconomic factors altogether, the accuracy is proved to be better than conducting each one alone. To investigate the better predictive performance of our study, feature selection methods of both filter and wrapper groups are employed with exploring the hybrid models to identify the most relevant features for Thai coronary heart disease. Relief Attribute Evaluation with Bayes Theorem is proved to be the best one with the accuracy of 92.59%, classified by SVM. To prove the accuracy enhancement, we perform rediscretization model on predefined medical values to examine different physical and personalized information of each person which can be incurred the coronary heart disease in different situation. The findings found that equal-depth rediscretization values on 7 major risk factors as Obesity, Hypertension, age, LDL, HDL, Fasting Blood Sugar, and Triglyceride, influences and improves with the better accuracy than predefined values of 95.50% classified by SVM. Thus, this finding shows that the proposed technique definitely outperforms predefined values from medical field.
Document Type
Article
Source Type
Journal
Keywords
Coronary heart diseaseData miningFeature selectionRediscretization on clinicalSocioeconomic statusValues
ASJC Subject Area
Engineering : Engineering (miscellaneous)Physics and Astronomy : Physics and Astronomy (miscellaneous)Business, Management and Accounting : Management of Technology and Innovation