Abstract
Online learning is a growing branch of machine learning which allows all traditional data mining techniques to be applied on an online stream of data in real-time. In this dissertation, we present three efficient algorithms for feature ranking in online classification problems. Each of the methods are tailored to work well with different types of classification tasks and have different advantages. The reason for this variety of algorithms is that like other machine learning solutions, there is usually no algorithm which works well for all types of tasks. The first method, is an online sensitivity based feature ranking (SFR) which is updated incrementally, and is designed for classification tasks with continuous features. We take advantage of the concept of global sensitivity and rank features based on their impact on the outcome of the classification model. In the feature selection part, we use a two-stage filtering method in order to first eliminate highly correlated and redundant features and then eliminate irrelevant features in the second stage. One important advantage of our algorithm is its generality, which means the method works for correlated feature spaces without preprocessing. It can be implemented along with any single-pass online classification method with separating hyperplane such as SVMs. In the second method, with help of probability theory we propose an algorithm which measures the importance of the features by observing the changes in label prediction in case of feature substitution. A non-parametric version of the proposed method is presented to eliminate the distribution type assumptions. These methods are application to all data types including mixed feature spaces. At last, we present a class-based feature importance ranking method which evaluates the importance of each feature for each class, these sub-rankings are further exploited to train an ensemble of classifiers. The proposed methods will be thoroughly tested using benchmark datasets and the results will be discussed in the last chapter.
Notes
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Graduation Date
2018
Semester
Summer
Advisor
Zheng, Qipeng
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Industrial Engineering and Management Systems
Degree Program
Industrial Engineering
Format
application/pdf
Identifier
CFE0007584
URL
http://purl.fcla.edu/fcla/etd/CFE0007584
Language
English
Release Date
February 2022
Length of Campus-only Access
3 years
Access Status
Doctoral Dissertation (Open Access)
STARS Citation
Razmjoo, Alaleh, "Methods for Online Feature Selection for Classification Problems" (2018). Electronic Theses and Dissertations. 6422.
https://stars.library.ucf.edu/etd/6422