A0702
Title: Patchwork PCA: Joint dimension reduction for semi-overlapping data patches
Authors: Lili Zheng - University of Illinois Urbana - Champaign (United States) [presenting]
Abstract: Patchwork learning arises as a new and challenging data collection paradigm where both samples and features are observed in fragmented subsets. Due to technological limits, measurement expense, or multimodal data integration, such patchwork data structures are frequently seen in neuroscience, healthcare, and genomics, among others. Instead of analyzing each data patch separately, it is highly desirable to extract comprehensive knowledge from the whole data set. A new PCA method designed for patchwork learning is introduced, which extracts principal components for the whole sample and feature space based on a collection of semi-overlapping data patches. It is demonstrated how key challenges are addressed for patchwork data, such as non-random missingness, heterogeneous SNRs, irregular observational patterns, etc. Statistical error bounds are shown for the estimated principal components and sample loadings, as well as demonstrating their performance on real biomedical data sets.