Keywords
Machine Learning, Genome-Wide Association Study (GWAS), LASSO Regression, Ridge Regression, Variable Selection, Predictive Modeling in Genomics.
Description
This study investigates flowering time in maize using Genome-Wide Association Studies (GWAS) combined with penalized regression techniques. Leveraging a dataset of 5,000 recombinant inbred lines and over 7,000 SNP markers, the paper compares LASSO and Ridge regression for variable selection and predictive modeling. LASSO effectively reduces dimensionality by selecting the most impactful SNPs, while Ridge regression retains a broader set of features, resulting in slightly better predictive performance. The results highlight the strengths and limitations of both methods in high-dimensional genomic data and demonstrate the utility of penalized regression models in complex trait analysis.
Abstract
Genome-Wide Association Studies (GWAS) are instrumental in identifying genetic variants linked to complex traits, providing valuable insights into trait heritability and biological mechanisms. This study applies GWAS to investigate flowering time in maize, a critical adaptive trait, using a diverse dataset of 5,000 recombinant inbred lines across eight environments. Traditional GWAS methods often encounter challenges in high-dimensional datasets due to the presence of multiple small-effect genetic loci. To address this, we compared two penalized regression methods—LASSO and Ridge regression—to perform variable selection and regression analysis within a GWAS framework. LASSO effectively reduced the number of predictors by selecting the most impactful variables, while Ridge regression retained more features, offering a broader genetic context for predicting flowering time. Results demonstrated that Ridge regression yielded slightly better predictive performance, achieving a lower Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) than LASSO.
Course Name
STA 6366 Data Science 1
Instructor Name
Dr. RUI XIE
Rights
This work is licensed under a Creative Commons Attribution 4.0 International License.
College
College of Sciences
STARS Citation
Deb, Dipok, "Performance of LASSO and Ridge Regression for Variable Selection in Genome-Wide Association Studies of Maize Flowering Time" (2025). Data Science and Data Mining. 42.
https://stars.library.ucf.edu/data-science-mining/42