EcoSta 2022: Start Registration
View Submission - EcoSta2022
A0855
Title: Optimal subsampling in a massive data linear regression Authors:  Fei Tan - Indiana University-Purdue University Indianapolis (United States) [presenting]
Hanxiang Peng - IUPUI (United States)
Abstract: To fast approximate the least-squares estimate efficiently in a massive data linear regression by a subsampling estimate, we give numerous optimal sampling distributions based on the criteria of minimum bias and maximum information. We show that the statistical leverage scores-based distribution minimizes the bias, the A-optimal distribution minimizes the trace norm of the covariance matrix, and a distribution with the likelihood ratio to the uniform bounded away from zero attains the optimal convergence rate. We exhibit the necessity of truncating the sampling distributions, provide relative error bounds, and discuss subsample size determination. We construct an algorithm with a running time $o(n\times p^2)$, and report a large simulation study and a massive real data application.