Keywords

Mathematical statistics

Abstract

The purpose of the present dissertation is to study model selection techniques which are specifically designed for classification of high-dimensional data with a large number of classes. To the best of our knowledge, this problem has never been studied in depth previously. We assume that the number of components p is much larger than the number of samples n, and that only few of those p components are useful for subsequent classification. In what follows, we introduce two Bayesian models which use two different approaches to the problem: one which discards components which have “almost constant” values (Model 1) and another which retains the components for which between-group variations are larger than withingroup variation (Model 2). We show that particular cases of the above two models recover familiar variance or ANOVA-based component selection. When one has only two classes and features are a priori independent, Model 2 reduces to the Feature Annealed Independence Rule (FAIR) introduced by Fan and Fan (2008) and can be viewed as a natural generalization to the case of L > 2 classes. A nontrivial result of the dissertation is that the precision of feature selection using Model 2 improves when the number of classes grows. Subsequently, we examine the rate of misclassification with and without feature selection on the basis of Model 2.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date

2011

Semester

Fall

Advisor

Pensky, Marianna

Degree

Doctor of Philosophy (Ph.D.)

College

College of Sciences

Department

Mathematics

Degree Program

Mathematics

Format

application/pdf

Identifier

CFE0004097

URL

http://purl.fcla.edu/fcla/etd/CFE0004097

Language

English

Release Date

December 2011

Length of Campus-only Access

None

Access Status

Doctoral Dissertation (Open Access)

Subjects

Dissertations, Academic -- Sciences, Sciences -- Dissertations, Academic

STARS Citation

Davis, Justin Kyle, "Bayesian Model Selection For Classification With Possibly Large Number Of Groups" (2011). Electronic Theses and Dissertations. 1837.
https://stars.library.ucf.edu/etd/1837

Download

Included in

Mathematics Commons

COinS

Electronic Theses and Dissertations

Bayesian Model Selection For Classification With Possibly Large Number Of Groups

Keywords

Abstract

Notes

Graduation Date

Semester

Advisor

Degree

College

Department

Degree Program

Format

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

Subjects

STARS Citation

Included in

Browse Advisors

Explore

Connect

Electronic Theses and Dissertations

Bayesian Model Selection For Classification With Possibly Large Number Of Groups

Author

Keywords

Abstract

Notes

Graduation Date

Semester

Advisor

Degree

College

Department

Degree Program

Format

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

Subjects

STARS Citation

Included in

Share

Browse Advisors

Explore

Connect