EcoSta 2017: Start Registration
View Submission - EcoSta2017
A0570
Title: Selecting the number of principal components: Estimation of the true rank of a noisy matrix Authors:  Yunjin Choi - National University of Singapore (Singapore) [presenting]
Jonathan Taylor - Stanford University (United States)
Robert Tibshirani - Stanford University (United States)
Abstract: Principal Component Analysis (PCA) is one of the most popular methods in multivariate data analysis. Despite the popularity of the method, there is no widely adopted standard approach to select the number of principal components to retain. To address this issue, we propose a novel method utilizing the hypothesis testing framework and test whether the currently selected principal components capture all the statistically significant signals in the given data set. While existing hypothesis testing approaches do not enjoy the exact type 1 error property and lose power under some scenarios, the proposed method provides an exact type 1 error control along with decent size of power in detecting signals. Central to our work is the post-selection inference framework which facilitates valid inference after data-driven model selection; the proposed hypothesis testing method provides exact type 1 error controls by conditioning on the selection event which leads to the inference. We also introduce a possible extension of the proposed method for high-dimensional data.