CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1193
Title: Integrated principal components analysis Authors:  Tiffany Tang - University of California, Berkeley (United States) [presenting]
Genevera Allen - Rice University (United States)
Abstract: Data integration, or the strategic analysis of multiple data sources simultaneously, can often lead to discoveries that may be hidden in individualistic analyses of a single data source. A new statistical data integration method is developed, named Integrated Principal Components Analysis (iPCA), which is a model-based generalization of PCA and serves as a practical tool to find and visualize common patterns that occur in multiple datasets. The key idea driving iPCA is the matrix-variate normal model, whose Kronecker product covariance structure captures both individual patterns within each dataset and joint patterns shared by multiple datasets. Building upon this model, several penalized (sparse and non-sparse) covariance estimators are developed for iPCA and their theoretical properties are studied. The sparse iPCA estimator consistently estimates the underlying joint subspace, and using geodesic convexity, we prove that our non-sparse iPCA estimator converges to the global solution of a non-convex problem. The practical advantages of iPCA are demonstrated through simulations and a case study application to integrative genomics for Alzheimer's Disease. In particular, it is shown that the joint patterns extracted via iPCA are highly predictive of a patient's cognition and Alzheimer's diagnosis.