CFE-CMStatistics 2025: Start Registration
View Submission - CFE-CMStatistics 2025
A0893
Title: Bi-clustering RNA-sequencing data: A model-based approach using a MPLN distribution Authors:  Caitlin Kral - Carleton University (Canada) [presenting]
Sanjeena Dang - Carleton University (Canada)
Ryan Browne - University of Waterloo (Canada)
Evan Chance - Carleton University (Canada)
Abstract: Bi-clustering is a technique that simultaneously clusters observations and features (i.e., variables) in a dataset. This technique is used in bioinformatics to simultaneously identify clusters of disease and non-diseased patients and the network of genes with distinct correlation patterns based on their gene expression values. While several Gaussian mixture models-based biclustering approaches currently exist in the literature for continuous data, approaches to handle discrete data have not been well researched. Extending bi-clustering approaches to discrete data is imperative, as such data is commonly found within real-world applications such as bioinformatics. Recently, multivariate Poisson-lognormal (MLPN) models have emerged as an efficient model for modelling multivariate count data. It arises from a hierarchical Poisson structure, which allows for over-dispersion and correlation (both positive and negative). A MPLN model-based bi-clustering approach that utilizes a block-diagonal covariance structure is proposed. The clustering performance of the proposed model for clustering both observations and features using simulated and real-world data is demonstrated.