A1008
Title: Scalable community detection in massive networks via predictive inference
Authors: Srijan Sengupta - North Carolina State University (United States) [presenting]
Marianna Pensky - University of Central Florida (United States)
Subhankar Bhadra - North Carolina State University (United States)
Abstract: Identification of community structure in networks has been of particular interest in the statistics literature. In recent years, we have witnessed massive network datasets being generated in many fields. Community detection is challenging for such massive networks since existing standard community detection algorithms require high runtime and storage. We propose a novel algorithm using so-called predictive inference, where we use any statistically sound community detection algorithm to cluster a subset of nodes and use the estimated communities to classify the rest of the nodes. Decomposing the clustering problem into a small clustering sub-problem and a classification problem leads to excellent savings in runtime and memory with little loss of accuracy. We establish the theoretical properties of the proposed method and demonstrate its numerical performance in synthetic and real-world networks.