Title: Accounting for asymmetry and batch effects in meta-transcriptomics
Authors: Gregory Gloor - University of Western Ontario (Canada) [presenting]
Abstract: The ALDEx2 tool generates a probabilistic model of high throughput sequencing count data and manipulates that model using the tools and rules of compositional data analysis. Many biological experiments are conducted using data gathered via high throughput sequencing. These data are widely regarded as delivering counts per feature in each sample. However, the counts are more properly regarded as relative abundances (compositions) because the instrument imposes a restriction on the upper limit on total counts. Thus, increases in counts of one sample (or feature) must be compensated by a decrease in the counts of another sample or feature. ALDEx2 has recently incorporated general linear models and the ability to use asymmetric datasets in the analysis of high throughput sequencing datasets. We use three different vaginal meta-RNA-seq datasets from different labs and show that the combination of the GLM and asymmetry correction allow a principled meta-analysis of the joint datesets. We find that the most effective approach is to hold `housekeeping' functions as constant as possible and to determine the change in expression relative to those functions. Confounders such as read length, sequencing platform and read depth are not observed to have a significant effect.