EcoSta 2022: Start Registration
View Submission - EcoSta2022
A0883
Title: Best subset selection is robust against design dependence Authors:  Yongyi Guo - Princeton University (United States) [presenting]
Ziwei Zhu - University of Michigan, Ann Arbor (China)
Jianqing Fan - Princeton University (United States)
Abstract: Best subset selection(BSS) is among the most classical variable selection methods for high-dimensional linear regression. Nevertheless, BSS has recently received far less interest than its convex relaxed forms, largely because of NP-hardness. The aim is to invoke a renaissance in BSS by providing a computationally efficient implementation with superior variable selection performance. We first analyze the variable selection properties of BSS with known sparsity. We show that an identifiability margin condition is sufficient and nearly necessary for BSS to recover the true model. This condition is free of the restricted eigenvalues of the design, suggesting the robustness of BSS against design dependence. A relaxed version of this condition is sufficient for BSS to achieve the sure screening property when the true sparsity is overestimated. Next, we show that the established properties for BSS carry over to any near best subset. In particular, an approximate BSS algorithm based on two-stage iterative hard thresholding(IHT) can find a sparse sure screening subset within logarithmic steps. Based on this, the true model can be recovered easily. The simulations and real data examples show that this algorithm yields lower false discovery rates and higher true positive rates than competing approaches, especially under highly correlated design.