Automated machine learning (AutoML) has become a new trend which is the process of automating the complete pipeline from the raw dataset to the development of machine learning model. It not only can relief data scientists' works but also allows non-experts to finish the jobs without solid knowledge and understanding of statistical inference and machine learning. One limitation of AutoML framework is the data quality differs significantly batch by batch. Consequently, fitted model quality for some batches of data can be very poor due to distribution shift for some numerical predictors. In this dissertation, we develop an intelligent binning to resolve this problem. In addition, various regularized regression classifiers (RRCs) including Ridge, Lasso and Elastic Net regression have been tested to enhance model performance further after binning. We focus on the binary classification problem and have developed an AutoML framework using Python to handle the entire data preparation process including data partition and intelligent binning. This system has been tested extensively by simulations and real datasets analyses and the results have shown that (1) All the models perform better with intelligent binding for both balanced and imbalance binary classification problem. (2) Regression-based methods are more sensitive than tree-based methods using intelligent binning. RRCs can work better than other tree methods by using intelligent binning technique. (3) Weighted RRC can obtain the best results compared to other methods. (4) Our framework is an effective and reliable tool to conduct AutoML.
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Doctor of Philosophy (Ph.D.)
College of Sciences
Statistics and Data Science
Big Data Analytics
Length of Campus-only Access
Doctoral Dissertation (Open Access)
Zhu, Jianbin, "Automated Machine Learning: Intellient Binning Data Preparation and Regularized Regression Classfier" (2023). Electronic Theses and Dissertations, 2020-. 1706.