Title
Bayesian Feature Selection For Classification With Possibly Large Number Of Classes
Keywords
ANOVA; Bayesian feature selection; Classification; High-dimensional data
Abstract
In what follows, we introduce two Bayesian models for feature selection in high-dimensional data, specifically designed for the purpose of classification. We use two approaches to the problem: one which discards the components which have "almost constant" values (Model 1) and another which retains the components for which variations in-between the groups are larger than those within the groups (Model 2). We assume that p»n, i.e. the number of components p is much larger than the number of samples n, and that only few of those p components are useful for subsequent classification. We show that particular cases of the above two models recover familiar variance or ANOVA-based component selection. When one has only two classes and features are a priori independent, Model 2 reduces to the Feature Annealed Independence Rule (FAIR) introduced by Fan and Fan (2008) and can be viewed as a natural generalization of FAIR to the case of L>2 classes. The performance of the methodology is studies via simulations and using a biological dataset of animal communication signals comprising 43 groups of electric signals recorded from tropical South American electric knife fishes. © 2011 Elsevier B.V.
Publication Date
9-1-2011
Publication Title
Journal of Statistical Planning and Inference
Volume
141
Issue
9
Number of Pages
3256-3266
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1016/j.jspi.2011.04.011
Copyright Status
Unknown
Socpus ID
79955875604 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/79955875604
STARS Citation
Davis, Justin; Pensky, Marianna; and Crampton, William, "Bayesian Feature Selection For Classification With Possibly Large Number Of Classes" (2011). Scopus Export 2010-2014. 2739.
https://stars.library.ucf.edu/scopus2010/2739