EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0854
Title: A distance metric based space filling subsampling method for nonparametric models Authors:  Dianpeng Wang - Beijing Institute of Technology (China) [presenting]
Abstract: Taking subset samples from the original data set is an efficient and popular strategy for handling massive data too large to be directly modelled. Employing a subsampling scheme to collect observations intelligently to optimize inference and prediction accuracy is crucial. A proportionate sampling method is proposed that uses distance metric-based strata to select subsamples from high-volume data sets. To minimize the maximal distance from pairs of samples that are located in the same stratum, Voronoi cells of the thinnest covering lattices are used to partition the space. With the help of an algorithm to quickly identify the cell an observation is located in, the computational cost of the subsampling method is proportional to the number of observations and irrelevant to the number of cells, which makes the method applicable to extremely large data sets. Results from simulated studies and real data analysis show that the new method is remarkably better than existing approaches when used in conjunction with a k-nearest neighbor or Gaussian process models.