EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0763
Title: The A-optimal subsampling for big data penalized spline single index models Authors:  Hanxiang Peng - IUPUI (United States) [presenting]
Fang Li - Indiana University-Purdue University Indianapolis (United States)
Haixia Smithson - US Food and Drug Administration (United States)
Abstract: Motivated by the computational burden in fitting single index models caused by high parameter dimensionality and possibly compounded by data of massive size, the A-optimal subsampling estimators are constructed to approximate the full data estimators. The A-optimal sampling distribution is derived by minimizing the sum of the component variances of the subsampling estimator. For an arbitrary distribution \((\pi_i)\) on the \(n\) data points with its minimum \(\pi_{\min}\) satisfying \(n\pi_{\min}\geq l_0>0\) for some constant \(l_0\), asymptotic normality of the subsampling estimator is proven for either fixed or growing sum \(p+d\) of the number \(p\) of the index parameters and the number \(d\) of basis functions as the subsample size \(r\) tends to infinity such that $p+d$ grows slowly at the rate \(p+d=o(r^{1/5})\) under suitable conditions. An unweighted subsampling estimator is also constructed; its asymptotic normality is proven for growing dimension without the foregoing assumption on \((\pi_i)\) and establishes its higher efficiency than the weighted estimator. The analytic formulas of the first-order bias are provided for both estimators and explore how the estimators and their biases are affected by the penalty \(\lambda\), \(p+d\), \((\pi_i)\) and \(r\). A fast algorithm having running time \(O(r^2(p+d))\) is constructed with \(r\) far less than \(n\), and the numerical behavior of the Subsampling approach is studied using both simulated and real data.