COMPSTAT 2022: Start Registration
View Submission - COMPSTAT2022
A0267
Title: Selective inference for k-means clustering Authors:  Yiqun Chen - University of Washington, Seattle (United States) [presenting]
Daniela Witten - University of Washington (United States)
Abstract: The problem of testing for a difference in means between clusters of observations identified via k-means clustering is considered. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we take a selective inference approach. We propose a finite-sample p-value that controls the selective Type I error to test the difference in means between a pair of clusters obtained using k-means clustering, and show that it can be efficiently computed. We apply our proposal in simulations and on hand-written digits data and single-cell RNA-sequencing data.