CMStatistics 2022: Start Registration
View Submission - CMStatistics
B0716
Title: Clustering longitudinal matrix-variate count data Authors:  Sanjeena Dang - Carleton University (Canada) [presenting]
Abstract: Three-way data structures or matrix-variate data are commonly generated in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes at r conditions over p time points. Matrix variate distributions offer a natural way to model three-way data, and mixtures of matrix variate distributions can be used to cluster three-way data. Clustering of gene expression data is carried out as means of discovering gene co-expression networks. A family of a mixture of matrix variate Poisson-log normal distributions is introduced for clustering longitudinal read counts from RNA sequencing. By considering the matrix variate structure, the number of covariance parameters to be estimated is reduced, and the components of resulting covariance matrices provide a meaningful interpretation. To account for the longitudinal nature of the data, a modified Cholesky decomposition is utilized in the covariance structure. Furthermore, a parsimonious family of models are developed by imposing constraints on elements of these decompositions. The models are applied to both real and simulated data.