CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A0272
Title: Variable shrinkage and subdata selection in big data Authors:  Vasilis Chasiotis - Athens University of Economics and Business (Greece) [presenting]
Lin Wang - Purdue University (United States)
Dimitris Karlis - RC Athens University of Economics and Business (Greece)
Abstract: In the field of big data analytics, the search for efficient subdata selection methods that enable robust statistical inferences with minimal computational resources is of high importance. A procedure prior to subdata selection could perform variable selection, as only a subset of a large number of variables is active. An approach is proposed for situations where both the size of the full dataset and the number of variables are large. This approach identifies the active variables by applying a procedure inspired by the random Lasso, followed by subdata selection based on leverage scores, in order to build a predictive model. The proposed approach outperforms existing methods in the current literature in both variable selection and prediction while also exhibiting significant improvements in computing time. The usage of the full dataset is considered as well. Simulation experiments, as well as a real data application, are provided.