Keywords
Heart disease, CDC, GLM, Lasso Logistic Regression, Linear Discriminant Analysis, Interpretability.
Description
This work investigates the use of interpretable linear classification models for heart disease prediction using clinical data from the Cleveland Heart Disease dataset. The study compares generalized linear models, L1-regularized logistic regression, and linear discriminant analysis to balance predictive modeling with transparency and clinical interpretability. Emphasis is placed on understanding the contribution of individual risk factors and maintaining model simplicity for practical medical decision support. The findings highlight the relevance of interpretable linear approaches as viable alternatives to complex black-box models in healthcare analytics.
Abstract
Heart disease remains a leading cause of mortality worldwide, underscoring the importance of accurate and transparent methods for early diagnosis. While many machine learning and artificial intelligence models have demonstrated strong predictive performance, their limited interpretability poses challenges for clinical adoption. In this study, we evaluate three interpretable linear classification models—Generalized Linear Model (GLM) logistic regression, L1-regularized (Lasso) logistic regression, and Linear Discriminant Analysis (LDA)—for heart disease prediction using the Cleveland Heart Disease dataset. Following comprehensive data preprocessing, the models are assessed on a held-out test set using standard evaluation metrics, including accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (ROC-AUC). The results show that all three models achieve strong discriminative performance. Among them, Lasso logistic regression attains the highest accuracy and F1-score, reflecting a favorable balance between sensitivity and specificity, while GLM and LDA exhibit comparable performance with slightly lower recall. Importantly, the GLM framework enables identification of clinically meaningful predictors, reinforcing its interpretability and relevance for medical decision-making. These findings demonstrate that interpretable linear models can provide reliable and transparent tools for heart disease prediction, offering a practical alternative to more complex black-box approaches in clinical settings.
Instructor Name
Dr. Liqiang Ni
Rights

This work is licensed under a Creative Commons Attribution 4.0 International License.
College
College of Sciences
STARS Citation
Deb, Dipok and Hossain, Emran, "Interpretable Linear Models for Heart Disease Prediction: A Comparative Study" (2026). Data Science and Data Mining. 52.
https://stars.library.ucf.edu/data-science-mining/52