Machine Learning, Regression, Life Expectancy, Predictive Analysis


This study presents a comprehensive analysis of three prominent machine learning regression models—Random Forest, XGBoost, and Support Vector Machine (SVM)—in the context of predictive analysis. Leveraging a carefully curated dataset, we explore the impact of various hyperparameters on model performance through an exhaustive tuning process. The Random Forest and XGBoost models exhibit robust predictive capabilities, with the former revealing notable insights through feature importance visualization. Additionally, SVM, optimized via GridSearchCV, demonstrates competitive performance. Evaluation metrics, including Mean Squared Error and R-squared, facilitate a thorough comparison of model efficacy. Results highlight nuanced strengths and weaknesses, informing practitioners on the suitability of each model for specific applications. This research contributes valuable insights to the ongoing discourse on machine learning regression, offering a practical guide for researchers and practitioners navigating the complex landscape of predictive analysis.


Fall 2023

Course Name

STA 6366 Data Science 1

Instructor Name

Dr. Rui Xie


College of Engineering and Computer Science

Accessibility Status

PDF accessibility verified using Adobe Acrobat Pro Accessibility Checker

Included in

Data Science Commons