View Submission

A0759

Title: The A-optimal subsampling approach to the analysis of count data of massive size Authors: Fei Tan - Indiana University-Purdue University Indianapolis (United States) [presenting]
Xiaofeng Zhao - North China University of Water Resources and Electric Power (China)
Hanxiang Peng - IUPUI (United States)
Abstract: Uniform and statistical leverage-scores-based (nonuniform) distributions are frequently used in the analysis of massive data. Both distributions, however, are not effective in the extraction of important information in data. The A-optimal subsampling estimators of parameters are constructed in generalized linear models (GLM) to approximate the full-data estimators and derive the A-optimal distributions based on the criterion of minimizing the sum of the component variances of the subsampling estimators. As the distributions have the same running time as the full-data estimator, the scoring algorithm introduced in a recent study is generalized in a big data linear model to GLM using the iterative weighted least squares. The purpose is to present a comprehensive numerical evaluation of the approach using the simulated and real data by comparing its performance with the uniform and the leverage-scores subsampling. The results exhibited that the approach substantially outperformed the uniform and the leverage-scores subsampling, and the algorithm significantly reduced the computing time required for implementing the full-data estimator.