Title: Distributed sparse regression for high dimensional financial big data based on gradient hard thresholding pursuit
Authors: Shanshan Wang - Beihang University (China) [presenting]
Abstract: With the rapid development of internet technology, the global volume of data has experienced explosive growth. In the field of finance, many datasets exhibit significant sample sizes and high-dimensional features. Meanwhile, within these datasets, variables often exhibit extremely high levels of correlation, posing significant challenges to effective analysis. The focus is on a distributed sparse regression algorithm framework tailored to the high-dimensional big data. Firstly, through distributed SVD, column orthogonalization of high-dimensional big data is achieved, effectively eliminating inter-variable correlations and addressing the issue of high correlation. Subsequently, by integrating the GraHTP algorithm based on $l_{0}$ regularization penalized regression and employing a divide-and-conquer framework, distributed solutions to high-dimensional sparse regression problems are pursued, thereby achieving rapid and efficient variable selection and parameter estimation. Furthermore, its desirable theoretical properties are also presented, including unbiasedness and sparse recovery, and the proposed algorithm is applied to simulated data characterized by high correlation, high dimensionality, and large sample sizes to validate its performance in both theory and simulation. Lastly, the application of the proposed algorithm to predict the annualized returns of 2,588 Chinese A-share stocks from 2019 to 2022 demonstrates its practical utility in the field of finance.