COMPSTAT 2024: Start Registration
View Submission - COMPSTAT2024
A0401
Title: Classical, Bayesian, and machine learning variable selection methods in colorectal cancer microbiome study: A comparison Authors:  Mohammad Fayaz - Allameh Tabatabaei University (Iran) [presenting]
Edda Russo - University of Florence (Italy)
Leandro Di Gloria - University of Siena (Italy)
Sara Bertorello - University of Florence (Italy)
Amedeo Amedei - University of Florence (Italy)
Abstract: Human microbiome research requires new and appropriate statistical methods to analyze the complex structure of microbiome datasets. A previously published microbiome dataset based on 46 Colorectal Cancer (CRC) and 15 Adenomatous Polyps (AP) patients are utilized, along with their microbiome samples from three sampling sites: saliva, tissue, and stools. The microbiome composition at five taxonomic levels (Phylum, Class, Order, Family, and Genus) was separately analyzed. The statistical challenge lies in classifying between CRC and AP and selecting variables for metagenomic features. Initially, classical methods such as LASSO, RIDGE regression, zero-inflated beta regressions (ZIB), generalized additive models for location scale and shape (GAMLSS-BEZI), and compositional data analysis (CoDA) were compared using the SIAMCAT etc. libraries in R. Subsequently, Bayesian model averaging (BMA) methods for generalized linear models (GLM) with different priors were employed using the BMA and BAS packages. Additionally, various interpretable machine learning (IML) algorithms, including variable importance plot (VIP), partial dependence plot (PDP), local interpretable model-agnostic explanations (LIME), and Shapley values for machine learning methods (ML) such as random forests, were explored. Finally, a concordance plot of selected metagenomics across different methods is presented.