CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A0806
Title: Multi-source matrix data integration via embedding alignment Authors:  Runbing Zheng - Johns Hopkins University (United States) [presenting]
Minh Tang - North Carolina State University (United States)
Abstract: Motivated by the increasing demand for multi-source data integration in various scientific fields, the problem of matrix completion is studied where the observable information of the whole to-recover matrix presents certain block-wise missing structures. The interest is in recovering an underlying low-rank and large-scale matrix, of which can only be observed several noisy submatrices of certain parts of entities. An algorithm is proposed to explicitly integrate all information revealed in the noisy submatrices, thereby efficiently estimating the underlying truth. Specifically, the proposed algorithm first estimates entity embeddings for each observed submatrix. It then aligns the embeddings between submatrices of overlapping entities and finally aggregates the aligned embeddings over all submatrices to recover the whole large matrix of interest. The asymptotic analysis showcases that the algorithm can entrywisely recover the underlying truth, and moreover, the entrywise fluctuations of the estimate are proven to be mean-zero normally distributed. The simulation and real data studies show that the algorithm is efficient and effective for this structured matrix completion problem.