CFE-CMStatistics 2025: Start Registration
View Submission - CFE-CMStatistics 2025
A1201
Title: Tree-guided variable selection methods for the regularized Dirichlet-tree multinomial regression Authors:  Alysha Cooper - University of Guelph (Canada) [presenting]
Zeny Feng - University of Guelph (Canada)
Ayesha Ali - University of Guelph (Canada)
Abstract: Hispanics and Latino Americans make up one of the largest minority groups in the United States, yet it is underexplored why they face a disproportionately high prevalence of chronic diseases such as asthma, diabetes, and obesity. While socioeconomic disadvantages and limited access to healthcare are known contributors, less attention has been given to how lifestyle changes following relocation to the U.S. may influence gut health and, in turn, overall health. Gut microbiome composition can be measured as bacterial counts organized along a taxonomic tree. The Dirichlet-tree multinomial (DTM) regression model accommodates the high variability in microbial count data while respecting evolutionary relationships among taxa. Variable selection becomes critical in DTM regression due to the high dimensionality of the outcome and many potentially relevant factors influencing gut bacteria. A key issue is that standard selection approaches may ignore dependencies among outcomes arising from the tree structure. Two tree-guided penalties for DTM regression are proposed: 1) the tree-guided sparse group lasso and 2) the tree-guided hierarchical lasso. Both penalties introduce sparsity into the model while leveraging known relationships among taxa in the tree. The regularized DTM regression model is demonstrated through simulations and analysis of gut microbiome data from the Hispanic Community Health Study/Study of Latinos.