CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A1029
Title: Selective inference after community detection on a single network Authors:  Daniel Kessler - University of North Carolina at Chapel Hill (United States) [presenting]
Ethan Ancell - University of Washington (United States)
Daniela Witten - University of Washington (United States)
Abstract: Networks arise in numerous applications in the social and biological sciences. Many network modeling tasks involve learning how to partition nodes into communities. While much work has focused on the statistical properties of community detection for edge-independent models, inference on the connectivity properties within and among communities has received relatively less attention. One particular challenge is that in many applications, only a single network is observed, and so both community detection and inference on just one network are conducted. Failing to account for this "double-dipping'' can yield an invalid inference, but sample-splitting is nontrivial. By characterizing community detection as a form of model selection, recent developments are leveraged in selective inference in order to develop procedures for valid inference on the statistical properties of a single network after community detection. Because communities are learned from the data, there is the possibility of model misspecification, which is addressed using "sandwich estimators" of the variance. In general, the approach affords control of the so-called "selective Type I error rate" and is applicable to edge-independent networks with binary edges (using data fission) as well as many classes of weighted edges (using data thinning). Central limit theorems are established for the estimators, and the utility of the methods is demonstrated in numerical simulations.