A0250
Title: Augmented doubly robust post-imputation inference for proteomic data
Authors: Kathryn Roeder - Carnegie Mellon University (United States)
Jing Lei - Carnegie Mellon University (United States)
Haeun Moon - Seoul National University (Korea, South) [presenting]
Abstract: Quantitative measurements produced by mass spectrometry proteomics experiments offer a direct way to explore the role of proteins in molecular mechanisms. However, analysis of such data is challenging due to the large proportion of missing values. A common strategy to address this issue is to utilize an imputed dataset, which often introduces systematic bias into downstream analyses if the imputation errors are ignored. A statistical framework is proposed, inspired by doubly robust estimators that offer valid and efficient inference for proteomic data. The framework combines powerful machine learning tools, such as variational autoencoders, to augment the imputation quality with high-dimensional peptide data and a parametric model to estimate the propensity score for debiasing imputed outcomes. The estimator is compatible with the double machine learning framework and has provable properties. Simulation studies verify its empirical superiority over other existing procedures. In application to both single-cell proteomic data and bulk-cell Alzheimer's disease data, the method utilizes the imputed data to gain additional, meaningful discoveries and yet maintains good control of false positives.