Online learning is a growing branch of machine learning which allows all traditional data mining techniques to be applied on an online stream of data in real-time. In this dissertation, we present three efficient algorithms for feature ranking in online classification problems. Each of the methods are tailored to work well with different types of classification tasks and have different advantages. The reason for this variety of algorithms is that like other machine learning solutions, there is usually no algorithm which works well for all types of tasks. The first method, is an online sensitivity based feature ranking (SFR) which is updated incrementally, and is designed for classification tasks with continuous features. We take advantage of the concept of global sensitivity and rank features based on their impact on the outcome of the classification model. In the feature selection part, we use a two-stage filtering method in order to first eliminate highly correlated and redundant features and then eliminate irrelevant features in the second stage. One important advantage of our algorithm is its generality, which means the method works for correlated feature spaces without preprocessing. It can be implemented along with any single-pass online classification method with separating hyperplane such as SVMs. In the second method, with help of probability theory we propose an algorithm which measures the importance of the features by observing the changes in label prediction in case of feature substitution. A non-parametric version of the proposed method is presented to eliminate the distribution type assumptions. These methods are application to all data types including mixed feature spaces. At last, we present a class-based feature importance ranking method which evaluates the importance of each feature for each class, these sub-rankings are further exploited to train an ensemble of classifiers. The proposed methods will be thoroughly tested using benchmark datasets and the results will be discussed in the last chapter.
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Industrial Engineering and Management Systems
Length of Campus-only Access
Doctoral Dissertation (Open Access)
Razmjoo, Alaleh, "Methods for Online Feature Selection for Classification Problems" (2018). Electronic Theses and Dissertations. 6422.
Restricted to the UCF community until February 2022; it will then be open access.