Computational Challenges In Group Membership Prediction Of Highly Imbalanced Big Data Sets

Abstract

Predicting group membership in highly skewed data is a common problem found in observational studies. Highly skewed data are also called class imbalanced data. Classifiers using class imbalance data will typically create rules that are biased toward the overrepresented group. Imbalance is thought to only affect classification when the data set is highly imbalanced and relatively small, although no formal definition or 120study has been proposed to indicate what level of imbalance matters, especially with respect to Big Data. Large imbalanced data sets present computational issues beyond that of just imbalance, and not all classifiers react the same. We present a formal definition of imbalance along with an understanding of at what levels researchers should consider alternative approaches when faced with large imbalanced data.

Publication Date

1-1-2017

Publication Title

Computational Intelligence Applications in Business and Big Data Analytics

Number of Pages

119-138

Document Type

Article; Book Chapter

Personal Identifier

scopus

DOI Link

https://doi.org/10.1201/9781315180748

Socpus ID

85052478343 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85052478343

This document is currently not available here.

Share

COinS