A0562
Title: Using community-wide data to address (some) challenges in single cell data
Authors: Kim-Anh Le Cao - University of Melbourne (Australia) [presenting]
Abstract: Cell identity classification is an ongoing challenge for analysing single-cell RNA-seq (scRNA-seq) data. Numerous tools exist for predicting cell identity using single-cell reference atlases. However, many challenges remain, including correcting for inherent batch effects between reference and query data and insufficient phenotype data from the reference. The proposed method aims to build bulk transcriptome atlases as references against which single cell identity can be queried. The advantage is that bulk data often contain detailed phenotype information and that numerous high-quality bulk datasets can be reused. A new computational and statistical framework, Sincast (SINgle-cell data CASTing onto reference), will be introduced to project and query scRNA-seq data to bulk RNA-seq data using principal component analysis and diffusion map. Structural discrepancies between bulk and single-cell data are solved by either aggregating or imputing single cells and the most beneficial approach, depending on the data context, is discussed. Sincast can also be used to reveal intermediate single-cell states when projected against bulk data. The approach in several case studies is illustrated.