B1932
Title: A two-part Tweedie model for differential analysis of omics data
Authors: Arinjita Bhattacharyya - Merck (United States) [presenting]
Abstract: One common objective in single-cell RNA-sequencing studies is to detect differentially expressed genes across experimental conditions. Due to the nature of the associated data which is typically characterized by a large number of zero counts, most published methods employ two-part models to identify the effects of biological variation in this data. While these methods are able to detect differences in the expression prevalence and the average expression level, they fail to provide an unconditional interpretation of covariate effects on the average gene expression, reducing their flexibility in practical applications. A two-part Tweedie regression model (TPCPLM) is proposed for testing the association between overall gene expression and clinical covariates for both individual-level and cell-level differential expression. The model includes a logistic regression component to model the binarized representation of the data and a Tweedie regression component to model the overall gene expression, where each component may include a random effect to account for the repeated measurements. Simulation studies show that the TPCPLM model outperforms published methods in false discovery rate control while maintaining power. In real data, TPCPLM identifies uniquely detected genes not easily identified by published methods. TPCPLM is available as part of the open-source R package.