COMPSTAT 2022: Start Registration
View Submission - COMPSTAT2022
A0440
Title: Modelling three-way RNA sequencing data using matrix-variate Gaussian mixture models Authors:  Theresa Scharl - Boku Vienna (Austria) [presenting]
Bettina Gruen - Wirtschaftsuniversität Wien (Austria)
Abstract: RNA sequencing of time-course experiments leads to three-way count data where the dimensions are the genes, the time points and the biological units. Clustering of RNA-seq data allows the detection of groups of co-expressed genes over time. After standardisation, the normalised counts of individual genes across time points and biological units constitute compositional data. We propose the following procedure to suitably cluster the standardised three-way RNA-seq data: (1) Transform the data using the isometric log-ratio transform to map the composition in the D-part Aitchison-simplex to a D-1 dimensional Euclidean vector and (2) analyse the transformed RNA-seq data using Gaussian mixture models. To account for the three-way structure, we suggest using matrix-variate Gaussian mixture models to find groups of genes with similar expression patterns over time by simultaneously taking into account the different process conditions. This enables the specification of more parsimonious models assuming suitable time or biological effects within and across clusters. Such models also allow for an easier interpretation of the fitted model and the clusters obtained. The proposed three-way clustering approach will be applied to RNA-seq data from E. coli bioproduction processes and also compared to the two-way approach after flattening out the biological units.