EcoSta 2023: Start Registration
View Submission - EcoSta2023
A1131
Title: Bayesian profile regression for high-dimensional data: An application to osteoarthritis proteomic data Authors:  Brian Tom - MRC Biostatistics Unit (United Kingdom)
Sylvia Richardson - MRC Biostatistics - Cambridge (United Kingdom)
Laura Bondi - University of Cambridge and Bocconi University (United Kingdom) [presenting]
Abstract: There is a huge unmet need in osteoarthritis (OA), with an estimated 8.5 million people affected in the UK. It is regarded as a highly heterogeneous disease and is purported to exist in different forms. As part of the STEpUP OA collaboration, an academic-industry partnership, the work explores the molecular pathways of OA and aims at identifying subpopulations of patients homogeneous for protein marker profiles such that each cluster has a clinical meaning (outcome-guided clustering). Bayesian profile regression (model-based outcome-guided clustering approach) is carried out to identify clusters of protein marker profiles that are associated with clinically relevant outcomes, such as disease radiographic grade (low vs advanced). This clustering methodology can handle possibly inter-related explanatory variables and uses the information in both these explanatory variables (i.e. 6000 synovial protein markers) and the outcome to produce model-based clustering structures, where the uncertainty associated with these clustering structures and the number of clusters is reflected. Given the high dimensionality of the protein space, computational challenges arise when scaling profile regression in this context. The focus is on strategies for dimensionality reduction and variable selection, taking into account biological knowledge. Moreover, the influence of the clinical outcome to drive the clustering structure is investigated.