Randomized Robust Subspace Recovery For Big Data
Keywords
Big Data; Low Rank Matrix; Outlier Detection; Random Embedding; Randomized Algorithm; Robust PCA; Subspace Learning
Abstract
In this paper, a randomized PCA algorithm that is robust to the presence of outliers and whose complexity is independent of the dimension of the given data matrix is proposed. Using random sampling and random embedding techniques, the given data matrix is turned to a small compressed data. A subspace learning approach is proposed to extract the columns subspace of the low rank matrix from the compressed data. Two ideas for robust subspace learning are proposed to work under two different model assumptions. The first idea is based on the linear dependence between the columns of the low rank matrix, and the second is based on the independence between the columns subspace of the low rank matrix and the subspace spanned by the outlying columns. We derive sufficient conditions to guarantee the performance of the proposed approach with high probability. It is shown that the proposed algorithm can successfully identify the outliers just by using roughly O(r2) random linear data observations, where r is the rank of the low rank matrix, and provably achieve notable speedups in comparison to existing approaches.
Publication Date
11-10-2015
Publication Title
IEEE International Workshop on Machine Learning for Signal Processing, MLSP
Volume
2015-November
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/MLSP.2015.7324346
Copyright Status
Unknown
Socpus ID
84960884413 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/84960884413
STARS Citation
Rahmani, Mostafa and Atia, George K., "Randomized Robust Subspace Recovery For Big Data" (2015). Scopus Export 2015-2019. 1498.
https://stars.library.ucf.edu/scopus2015/1498