CMStatistics 2016: Start Registration
View Submission - CMStatistics
B1202
Title: Incorporating pathway information for prediction in omic applications Authors:  Renaud Tissier - Leiden University Medical Centre (Netherlands) [presenting]
Jeanine Houwing-Duistermaat - Leeds University (United Kingdom)
Mar Rodriguez Girondo - Leiden University Medical Center (Netherlands)
Abstract: Nowadays, omics datasets such as transcriptomics, proteomics, and metabolomics are available for building prediction models. To deal with correlation between the omics variables and the large number of variables, regularized regression techniques are typically used. A drawback of these methods is that the results are hard to interpret. To obtain more interpretable prediction models while keeping a good predictive ability, we propose to incorporate pathways information on the omics variables in the prediction models. We use a three-step approach: 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model. For the first step we use two methods, one based on weighted correlation and one based on Gaussian graphical modeling. Identification of modules (groups) of features is performed by hierarchical clustering. To incorporate the grouping information in a prediction model, we adopt two different strategies: group-based variable selection and group-specific penalization. We compare the performance of our new approaches (combinations of network and strategies) with standard regularized regression via simulations. Finally, our approaches are applied to two sets of omic sources (metabolomics and transcriptomics) in the prediction of body mass index (BMI) using longitudinal data from the Dietary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome (DILGOM) study, a population-based cohort from Finland.