B1753
Title: Distributed PCA for high-dimensional heterogeneous data
Authors: Yanrong Yang - The Australian National University (Australia) [presenting]
Abstract: Distributed principal component analysis (DPCA) aims to accurately estimate the principal eigenspace for data stored across multiple local machines. A weighted averaging approach is proposed for DPCA under heterogeneous cases, where the data follow different factor models across local machines but share the same principal eigenspace. Each local machine computes its principal eigenvectors as well as a value ``weight", and then transmits them to the central server; the central server aggregates this information from all local machines in a weighted averaging way and conducts PCA based on the aggregated information. Theoretically, we establish the rate of convergence for the weighted averaging DPCA, which demonstrates more efficiency than the previous equal-weight estimator under heterogeneous scenarios (e.g. different sample sizes, factors and error components across local machines). We conduct an extensive simulation study to show the outperformance under various heterogeneities. As a by-product, a new test statistic is proposed to detect the equivalence of principal eigenspace for multiple sets of high dimensional data. We develop the asymptotic distribution for this test statistic and simultaneously apply it in two statistical applications: one is to test change points for daily stock returns while another one is to cluster mortality data from multiple countries.