CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0215
Title: Innovative unsupervised approach for simultaneous subgroup recovery and group-specific feature identification Authors:  Wen Zhou - New York University (United States) [presenting]
Lyuou Zhang - Colorado State University (United States)
Xiwei Tang - University of Virginia (United States)
Lulu Wang - Gilead Sciences (United States)
Abstract: The challenge of identifying heterogeneous subgroups and their defining features simultaneously in large datasets, common in areas like omics studies and clinical research, has typically been addressed by methods focusing either solely on global informative features or treating feature selection and group recovery separately. These approaches, however, often yield suboptimal solutions by overlooking their interaction. To overcome this, PARSE is presented, an unsupervised learning approach that concurrently recognizes cluster-specific informative features while performing high-dimensional cluster analysis. PARSE, based on a novel non-convex regularization approach, prevents selecting excessive features by penalizing those with minimal differences across clusters. Its optimality for both feature identification and group recovery is demonstrated through its oracle property and established lower bounds. Implementation is achieved via a backward selection procedure integrated with a variant of the expectation-maximization algorithm, showing its computational feasibility. Comprehensive simulation studies and an application on single-cell RNAseq data show PARSE's superiority over existing methods, emphasizing its potential for advancing research across diverse fields.