A0245
Title: Graph sub-sampling for divide-and-conquer algorithms in large networks
Authors: Eric Yanchenko - Akita International University (Japan) [presenting]
Abstract: As networks continue to increase in size, handling large numbers of nodes and edges is practically relevant. Instead of working directly with the entire (large) network, analyzing sub-networks has become a popular approach. Due to a network's inherent inter-connectedness, sub-sampling is not always easy. While this problem has gained attention in recent years, it has not received sufficient attention from the statistics community. A thorough comparison of seven graph sub-sampling algorithms is provided by applying them to divide-and-conquer algorithms for community structure and core-periphery (CP) structure. After discussing the various algorithms and sub-sampling routines, theoretical results are derived for the misclassification rate of the divide-and-conquer algorithm for the CP structure under various sub-sampling schemes. Extensive experiments are then performed on both simulated and real-world data to compare the various methods. For the community detection task, it is found that sampling nodes uniformly at random yields the best performance. For the CP structure, on the other hand, there was no single winner, but algorithms that sampled core nodes at a higher rate consistently outperformed other sampling routines, e.g., random edge sampling and random walk sampling. The varying performance of the sampling algorithms on different tasks demonstrates the importance of carefully selecting a sub-sampling routine for the specific application.