B0834
Title: Efficient integrative factor models: Applications from nutritional epidemiology to cancer genomics
Authors: Alejandra Avalos Pacheco - Vienna University of Technology (Austria) [presenting]
Roberta De Vito - Brown University (United States)
Blake Hansen - Brown University (United States)
Abstract: Data integration of multiple studies can be key to understanding and gaining knowledge in statistical research. Such complex data present artifactual sources of variation, also known as covariate effects, that, if not corrected, could lead to unreliable inference. Traditional multi-study factor analysis (MSFA) have proven to be key for identifying reproducible signal of interest shared by different studies or populations, which traditional factor analysis may miss. Bayesian inference for such models relies on Markov Chain Monte Carlo (MCMC) methods, which scale poorly. Furthermore, MSFA does not include relevant covariates in the model that could bias the results. Both problems are tackled by (i) introducing variational inference (VI) algorithms to approximate the posterior distribution of Sparse MSFA, and (ii) presenting novel multi-study factor regression (MSFR) models to jointly learn common and study-specific factors while adjusting for covariate effects. The usefulness of the methods is shown in nutritional epidemiology and cancer genomic applications to (i) obtain dietary patterns, and their association with cardiometabolic disease risk for Hispanic groups and (ii) reveal biological pathways for ovarian cancer datasets using computational resources typically available on a laptop rather than a high-performance computing server.