Computational Challenges In Group Membership Prediction Of Highly Imbalanced Big Data Sets
Abstract
Predicting group membership in highly skewed data is a common problem found in observational studies. Highly skewed data are also called class imbalanced data. Classifiers using class imbalance data will typically create rules that are biased toward the overrepresented group. Imbalance is thought to only affect classification when the data set is highly imbalanced and relatively small, although no formal definition or 120study has been proposed to indicate what level of imbalance matters, especially with respect to Big Data. Large imbalanced data sets present computational issues beyond that of just imbalance, and not all classifiers react the same. We present a formal definition of imbalance along with an understanding of at what levels researchers should consider alternative approaches when faced with large imbalanced data.
Publication Date
1-1-2017
Publication Title
Computational Intelligence Applications in Business and Big Data Analytics
Number of Pages
119-138
Document Type
Article; Book Chapter
Personal Identifier
scopus
DOI Link
https://doi.org/10.1201/9781315180748
Copyright Status
Unknown
Socpus ID
85052478343 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85052478343
STARS Citation
Rivera, William A. and Goel, Amit, "Computational Challenges In Group Membership Prediction Of Highly Imbalanced Big Data Sets" (2017). Scopus Export 2015-2019. 6404.
https://stars.library.ucf.edu/scopus2015/6404