EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0167
Title: Innovative unsupervised approach for simultaneous subgroup recovery and group-specific feature identification Authors:  Wen Zhou - New York University (United States) [presenting]
Xiwei Tang - University of Virginia (United States)
Lyuou Zhang - Shanghai Univeristy of Finance and Economics (China)
Lulu Wang - Gilead Sciences (United States)
Abstract: Simultaneously identifying heterogeneous subgroups and the informative features defining them, especially in the absence of responses and with a plethora of features, has long been a challenge in various domains, including omics studies, clinical research, etc. Existing methods have either focused narrowly on global informative features or performed feature selection and group recovery as separate tasks, overlooking their interactions. Such methods might miss scientifically relevant information and lead to suboptimal feature identification and subgroup recovery solutions. To overcome these limitations, a novel unsupervised learning approach is introduced, PAirwise REciprocal fuSE (PARSE), which concurrently pinpoints cluster-specific informative features and conducts high-dimensional clustering. The method employs a new regularization that heavily penalizes features with minor differences across clusters, thus avoiding selecting less informative features that define clusters. The oracle property of PARSE is obtained, and lower bounds for clustering and cluster-specific feature identification are established, affirming the method's optimality in both aspects. For implementations, a computationally efficient enhanced expectation-maximization algorithm is devised. Extensive numerical studies and analysis on identifying gene signatures in human pancreatic cell subtypes using scRNAseq data showcase PARSE's superiority over existing methods.