A1665
Title: ZINBGT: Exploratory data analysis of transcriptomic expression using mixture models
Authors: Toby Kettlewell - University of Glasgow (United Kingdom) [presenting]
Yiyi Cheng - University of Glasgow (United Kingdom)
Thomas Otto - University of Glasgow (United Kingdom)
Vincent Macaulay - University of Glasgow (United Kingdom)
Mayetri Gupta - University of Glasgow (United Kingdom)
Abstract: Single-cell RNA sequencing (scRNA-seq) provides data on the signals associated with protein production within individual cells. This allows for the discovery of novel cell types, inference of cell trajectories, and fine-grained comparisons of different tissues. The analysis of scRNA-seq data uses a pipeline of methods, but benchmarking is currently unable to establish which methods are best for a given dataset. Because the conclusions drawn from an analysis depend on the choice of method, novel forms of exploratory data analysis are needed to investigate how datasets differ and the circumstances in which a given method is likely to perform best. Any such method needs to run quickly and provide easily interpretable visualisations. A family of mixture distributions on count data will be introduced, which capture the salient aspects of gene expression with the associated parameters acting as summary statistics of each gene. A novel variant of a 2d histogram will be proposed, allowing efficient exploration and comparison of large, high-dimensional datasets, while problematic genes are highlighted using a combination of a distance between model and data with bootstrapping. Human immune cells will be explored in terms of gene expression, and comparison with simulations will reveal differences that could compromise benchmarking.