EcoSta 2021: Start Registration
View Submission - EcoSta2021
A0251
Title: High-dimensional principal component analysis with heterogeneous missingness Authors:  Ziwei Zhu - University of Michigan, Ann Arbor (China) [presenting]
Tengyao Wang - London School of Economics (United Kingdom)
Richard Samworth - University of Cambridge (United Kingdom)
Abstract: The effect of missing data in Principal Component Analysis (PCA) is being studied. In simple, homogeneous missingness settings with a noise level of constant order, we show that an existing inverse-probability weighted (IPW) estimator of the leading principal components can (nearly) attain the minimax optimal rate of convergence, and discover a new phase transition phenomenon along the way. For heterogeneous missingness settings, we introduce a new method for high-dimensional PCA, called ``primePCA''. Starting from the IPW estimator, ``primePCA'' iteratively projects the observed entries of the data matrix onto the column space of our current estimate to impute the missing entries, and then updates our estimate by computing the leading right singular space of the imputed data matrix. We prove that in the noiseless case, the error of ``primePCA'' converges to zero at a geometric rate when the signal strength is not too small and the true principal eigenspaces are incoherent. An important feature of our theoretical guarantees is that they depend on average, as opposed to worst-case, properties of the missingness mechanism. Our numerical studies on both simulated and real data reveal that ``primePCA'' exhibits very encouraging performance across a wide range of scenarios.