CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A0356
Title: Bayesian high-dimensional linear regression with sparse projection-posterior Authors:  Subhashis Ghosal - North Carolina State University (United States)
Samhita Pal - North Carolina State University (United States) [presenting]
Abstract: A novel Bayesian approach is considered for estimation, uncertainty quantification, and variable selection for a high-dimensional linear regression model under sparsity. The number of predictors can be nearly exponentially large relative to the sample size. A conjugate normal prior is put, initially disregarding sparsity, but for making an inference, instead of the original multivariate normal posterior, the posterior distribution is used, induced by a map transforming the vector of regression coefficients to a sparse vector obtained by minimizing the sum of squares of deviations plus a suitably scaled 1-penalty on the vector. The resulting sparse projection-posterior distribution shows that contracts around the true value of the parameter at the optimal rate adapted to the sparsity of the vector. The true sparsity structure gets a large sparse projection-posterior probability. An appropriately recentered credible ball is further shown to have the correct asymptotic frequentist coverage. Finally, how the computational burden can be distributed to many machines is described, each dealing with only a small fraction of the whole dataset. A comprehensive simulation study is conducted under a variety of settings, and the proposed method is found to perform well for finite sample sizes. The method is implemented in an R package named sparseProj, and it is applied to the ADNI data, where the ADAS score is predicted based on selected gene-expression data.