CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1427
Title: Co-data learning for Bayesian additive regression trees Authors:  Mark van de Wiel - Amsterdam University Medical Centers (Netherlands)
Jeroen Goedhart - Amsterdam UMC (Netherlands) [presenting]
Thomas Klausch - Amsterdam UMC (Netherlands)
Abstract: One of the promises of omics data is to improve cancer diagnosis and find relevant biomarkers that may be used for therapy. However, omics data is typically high-dimensional, posing significant challenges for prediction and feature selection. To address these challenges, incorporating co-data is proposed, i.e. external information on the measured covariates, into Bayesian additive regression trees (BART), a sum-of-trees prediction model that utilizes priors on the tree parameters to prevent overfitting. To incorporate the co-data, an empirical Bayes (EB) framework is developed that estimates, assisted by co-data, prior covariate weights in the BART model. The proposed method can handle multiple types and sources of co-data, whereas most existing methods only allow co-data in the form of groups. Furthermore, the proposed EB framework enables the estimation of the other hyperparameters of BART as well. Empirical Bayes avoids using an arbitrary grid, as used for cross-validation, and may, therefore, render more refined hyperparameter estimates. It is shown that the method renders both improved predictions and variable selection compared to default BART in simulations. Moreover, it enhances prediction in an application to diffuse large B-cell lymphoma diagnosis based on mutations, translocations, and DNA copy number data. Furthermore, the method is competitive to state-of-the-art co-data learners.