Keywords
Machine Learning, k-NN, Regularized Regression, Random Missingness, MNIST
Abstract
This paper investigates the effect of random missingness on the performance of regularized multinomial logistic regression and the k-nearest neighbors (k-NN) classifier for handwritten digit recognition on the MNIST dataset. In particular, we study L1-regularized (LASSO) logistic regression and L2-regularized (Ridge) logistic regression alongside k-NN. Varying percentages of random missingness were introduced into the original dataset, and each model was evaluated in terms of its classification performance. The results show that random missingness degrades the performance of all three classifiers. Overall, k-NN consistently achieves higher accuracy than both L1- and L2-regularized logistic regression across all missingness levels; however, its performance declines more sharply as the proportion of missing data increases, whereas the L1 and L2 logistic regression models exhibit slightly lower baseline accuracy but more stable behavior under severe missingness.
Course Name
STA 6366 Data Science 1
Instructor Name
Dr. Rui Xie
College
College of Sciences
STARS Citation
Markwei, Daniel, "Evaluating Regularized Logistic Regression and k-NN on MNIST Under Increasing Random Missingness" (2026). Data Science and Data Mining. 53.
https://stars.library.ucf.edu/data-science-mining/53
Included in
Computer and Systems Architecture Commons, Data Science Commons, Data Storage Systems Commons, Digital Communications and Networking Commons, Robotics Commons, Systems Engineering and Multidisciplinary Design Optimization Commons