COMPSTAT 2023: Start Registration
View Submission - COMPSTAT2023
A0256
Title: Rank-based Bayesian joint variable selection and clustering of genome-wide transcriptomic data Authors:  Valeria Vitelli - University of Oslo (Norway) [presenting]
Abstract: The use of ranks in genomics is naturally linked to the underlying biological question, since one is often interested in overly-expressed genes in a given pathology. When aiming at analysing transcriptomic patient data for cancer subtype discovery, we have already successfully proposed to use a mixture-based clustering approach rooted in Bayesian Mallows models (BMM). BMM is able to handle heterogeneous patient data, and to both produce estimates of the consensus ranking of the genes shared among samples in the same cluster, and to fill in missing data via data augmentation. However, BMM is computationally intensive, thus relying on pre-selecting around 1000 genes to be used in the analysis. A lower-dimensional version of BMM (lowBMM) that scales to genome-wide transcriptomic data has also been proposed and used in the context of cancer genomics; however, lowBMM does not perform clustering. We now propose to perform genome-wide cancer subtyping of transcriptomic patient data via a Bayesian mixture of Mallows models that combines BMM and lowBMM. The model jointly performs clustering and variable selection, thus selecting the genes best representing the structural patterns of expression characterising each subtype. We study the performance of the method via simulations, and show the results of a pan-cancer analysis.