Randomized Robust Subspace Recovery For Big Data

Keywords

Big Data; Low Rank Matrix; Outlier Detection; Random Embedding; Randomized Algorithm; Robust PCA; Subspace Learning

Abstract

In this paper, a randomized PCA algorithm that is robust to the presence of outliers and whose complexity is independent of the dimension of the given data matrix is proposed. Using random sampling and random embedding techniques, the given data matrix is turned to a small compressed data. A subspace learning approach is proposed to extract the columns subspace of the low rank matrix from the compressed data. Two ideas for robust subspace learning are proposed to work under two different model assumptions. The first idea is based on the linear dependence between the columns of the low rank matrix, and the second is based on the independence between the columns subspace of the low rank matrix and the subspace spanned by the outlying columns. We derive sufficient conditions to guarantee the performance of the proposed approach with high probability. It is shown that the proposed algorithm can successfully identify the outliers just by using roughly O(r2) random linear data observations, where r is the rank of the low rank matrix, and provably achieve notable speedups in comparison to existing approaches.

Publication Date

11-10-2015

Publication Title

IEEE International Workshop on Machine Learning for Signal Processing, MLSP

Volume

2015-November

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/MLSP.2015.7324346

Socpus ID

84960884413 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84960884413

This document is currently not available here.

Share

COinS