A0969
Title: Subdata selection for prediction under model misspecification
Authors: Alvaro Cia-Mina - University of Navarra (Spain) [presenting]
Laura Deldossi - Università Cattolica del Sacro Cuore di Milano (Italy)
Jesus Lopez-Fidalgo - University of Navarra (Spain)
Chiara Tommasi - University of Milan (Italy)
Abstract: Subsampling is commonly employed to improve computational efficiency in regression models. However, existing methods primarily focus on minimizing errors in estimating parameters, whereas the main practical goal of statistical models often lies in minimizing prediction errors. A novel approach to selecting subdata for linear models under model misspecification is introduced. The method considers the distribution of covariates and specifically addresses scenarios with large samples where obtaining labels for the response variable is costly. A strategy is proposed to minimize the bias term of the random-X prediction error. As anticipated based on theoretical considerations, the method demonstrates a reduction in the bias of the prediction mean squared error compared to existing methods. Through simulations, empirical evidence of the performance and potential of the approach in enhancing prediction accuracy under model misspecification is presented.