EcoSta 2018: Registration
View Submission - EcoSta2018
Title: Estimating clusters from multivariate binary data via hierarchical Bayesian Boolean matrix factorization Authors:  Zhenke Wu - University of Michigan (United States) [presenting]
Livia Casiola-Rosen - Division of Rheumatology - Department of Medicine - Johns Hopkins University School of Medicine (United States)
Antony Rosen - Division of Rheumatology - Department of Medicine - Johns Hopkins University School of Medicine (United States)
Scott Zeger - Department of Biostatistics-- Johns Hopkins Bloomberg School of Public Health (United States)
Abstract: An ongoing challenge in subsetting autoimmune disease patients is to define autoantibody signatures produced against a library of elemental molecular machines each comprised of multiple component autoantigens. It is of significant value to quantify both components of the machines and the striking variations in their frequencies among individuals. Based on multivariate binary responses that represent subject-level presence or absence of proteins over a grid of molecular weights, we develop a Bayesian hierarchical model that represents observations as aggregation of a few unobserved machines where the aggregation varies by subjects. Our approach is to specify the model likelihood via factorization into two latent binary matrices: machine profiles and individual factors. Given latent factorization, we account for inherent uncertainties in immunoprecipitation, errors in measurement or both using sensitivities and specificities of protein detection. The posterior distribution for the numbers of patient clusters and machines are estimated from data and by design tend to concentrate on smaller values. The posterior distributions of model parameters are estimated via Markov chain Monte Carlo which makes a list of molecular machine profiles with uncertainty quantification as well as patient-specific posterior probability of having each machine. We demonstrate the proposed method by analyzing patients gel electrophoresis autoradiography (GEA) data for patient subsetting.