Title
A Scalable And Efficient Outlier Detection Strategy For Categorical Data
Abstract
Outlier detection has received significant attention in many applications, such as detecting credit card fraud or network intrusions. Most existing research focuses on numerical datasets, and cannot directly apply to categorical sets where there is little sense in calculating distances among data points. Furthermore, a number of outlier detection methods require quadratic time with respect to the dataset size and usually multiple dataset scans. These characteristics are undesirable for large datasets, potentially scattered over multiple distributed sites. In this paper, we introduce Attribute Value Frequency (AVF), a fast and scalable outlier detection strategy for categorical data. AVF scales linearly with the number of data points and attributes, and relies on a single data scan. AVF is compared with a list of representative outlier detection approaches that have not been contrasted against each other. Our proposed solution is experimentally shown to be significantly faster, and as effective in discovering outliers. © 2007 IEEE.
Publication Date
12-1-2007
Publication Title
Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
Volume
2
Number of Pages
210-217
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/ICTAI.2007.125
Copyright Status
Unknown
Socpus ID
48649108236 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/48649108236
STARS Citation
Koufakou, A.; Ortiz, E. G.; Georgiopoulos, M.; Anagnostopoulos, G. C.; and Reynolds, K. M., "A Scalable And Efficient Outlier Detection Strategy For Categorical Data" (2007). Scopus Export 2000s. 6137.
https://stars.library.ucf.edu/scopus2000/6137