CFE-CMStatistics 2025: Start Registration
View Submission - CFE-CMStatistics 2025
A0556
Title: Factor adjusted spectral clustering for mixture models Authors:  Soham Jana - University of Notre Dame (United States) [presenting]
Shange Tang - Princeton University (United States)
Jianqing Fan - Princeton University (United States)
Abstract: A factor modeling-based approach is studied for clustering high-dimensional data generated from a mixture of strongly correlated variables. Standard techniques for clustering high-dimensional data, e.g., naive spectral clustering, often fail to yield insightful results as their performances heavily depend on the mixture components having a weakly correlated structure. To address the clustering problem in the presence of a latent factor model, the factor-adjusted spectral clustering (FASC) algorithm is proposed, which uses an additional data denoising step via eliminating the factor component to cope with the data dependency. This method is proven to achieve an exponentially low mislabeling rate, with respect to the signal-to-noise ratio, under a general set of assumptions. The assumption bridges many classical factor models in the literature, such as the pervasive factor model, the weak factor model, and the sparse factor model. The FASC algorithm is also computationally efficient, requiring only near-linear sample complexity with respect to the data dimension. The applicability of the FASC algorithm is also shown with real data experiments and numerical studies, and it is established that FASC provides significant results in many cases where traditional spectral clustering fails.