Machine learning algorithms, Heart disease prediction, Decision tree algorithms, UCI Machine Learning Repository, 5-fold cross-validation


The paper presents a study on the use of machine learning algorithms for the prediction of heart disease, which is the leading cause of death worldwide. The study focuses on the use of decision tree algorithms, which have the advantage of considering a large number of risk factors. The heart disease data set was obtained from the UCI Machine Learning Repository and was analyzed using a decision tree classifier. The data set had 6 missing data points, which were deleted, leaving 279 instances for analysis. One-hot-encoding was performed on categorical variables with more than two responses. The decision tree classifier was optimized using 5-fold cross-validation to choose the best parameters. The results showed that the decision tree classifier had an accuracy of predicting correctly 81% of the patients as having heart disease and like wise 82% for not having heart disease, which was higher than other machine learning algorithms used in previous studies. This study demonstrates the potential of decision tree algorithms for predicting heart disease and highlights the importance of early identification of individuals at risk of developing cardiovascular disease.


Spring 2023

Course Name

STA 5703 Data Mining 1

Instructor Name

Xie, Rui


College of Sciences

Included in

Data Science Commons