EcoSta 2022: Start Registration
View Submission - EcoSta2022
A0811
Title: Scalable and interpretable rare feature aggregation with microbiome data Authors:  Kun Chen - University of Connecticut (United States) [presenting]
Abstract: Statistical learning with a large number of rare features is commonly encountered in modern applications, such as in analyzing the gun-brain axis with high-throughput microbiome features that are often given as compositions or presence/absence indicators. Properly balancing the features' rarity and specificity holds the key to harvesting valuable information from them. Fortunately, an inherited hierarchical tree structure often exists among the features, making it possible to perform interpretable feature aggregation. Two statistical learning approaches are introduced for rare feature aggregation and selection conforming to any given tree structure, one for compositional features and another for binary features. For compositional features, we propose Relative-Shift Regression, in which the compositions are aggregated based on whether shifting relative concentrations between them affects the outcome. For binary features, we propose Convex Logic Regression, in which feature reduction is achieved through both a sparsity pursuit and an aggregation promoter with the logic operator of ``or''. Equi-sparse convex regularization methods and efficient smoothing proximal gradient algorithms are developed with theoretical guarantees. Applications with microbiome data from a preterm infant study are discussed.