CMStatistics 2022: Start Registration
View Submission - CMStatistics
B0957
Title: Online data selection and sparse estimation for multivariate streaming data Authors:  Rui Xie - Universify of Central Florida (United States) [presenting]
Shuyang Bai - University of Georgia (United States)
Yongkai Chen - University of Georgia (United States)
Ping Ma - University of Georgia (United States)
Abstract: Real-time analysis of large-scale streaming multivariate data often faces a trade-off between statistical estimation efficiency and computational cost efficiency. For multivariate data streams, one needs to carefully balance the trade-off, especially for sparse and possibly under-determined regression problems, which require more computational efforts. Data selection enables one to process large-scale streaming data in real time, so one can fit and update the sparse model in seconds instead of hours. We study the online real-time joint data-dependent sample selection and continuous variable selection for a multi-dimensional spare regression problem for streaming data. We propose a class of online data selection methods that simultaneously achieve sampling and sparse estimation to improve the computational efficiency of the online analysis. The online sparse model estimation involves using coordinate descent algorithms for nonconvex penalized regression, and the real-time data selection adapts optimal design-based sequential online sampling. The performance of the sampling-assisted online sparse estimation method is assessed via simulation studies and real data examples.