EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0522
Title: Sparse heteroskedastic PCA in high dimensions Authors:  Zhao Ren - University of Pittsburgh (United States) [presenting]
Abstract: Principal component analysis (PCA) is one of the most commonly used techniques for dimension reduction and feature extraction. Though it has been well-studied for high-dimensional sparse PCA, little is known when the noise is heteroskedastic, which turns out to be ubiquitous in many scenarios, like single-cell RNA sequencing (scRNA-seq) data and information network data. An iterative algorithm is proposed for sparse PCA in the presence of heteroskedastic noise, which alternatively updates the estimates of the sparse eigenvectors using orthogonal iteration with adaptive thresholding in one step and imputes the diagonal values of the sample covariance matrix to reduce the estimation bias due to heteroskedasticity in the other step. The procedure is computationally fast and provably optimal under the generalized spiked covariance model, assuming the leading eigenvectors are sparse. A comprehensive simulation study shows its robustness and effectiveness in various settings. Additionally, the application of the new method to two high-dimensional genomics datasets, i.e., microarray and scRNA-seq data, demonstrates its ability to preserve inherent cluster structures in downstream analyses.