A0200
Title: High-dimensional variable selection in the presence of missing data
Authors: Lixing Liang - The University of Hong Kong (Hong Kong)
Yipeng Zhuang - The Education University of Hong Kong (Hong Kong)
Philip Yu - The Education University of Hong Kong (Hong Kong) [presenting]
Abstract: Regression analysis is often affected by high dimensionality, severe multicollinearity, and a large proportion of missing data. These problems may mask important relationships and even lead to biased conclusions. A novel computationally efficient method is proposed that integrates data imputation and variable selection to address these issues. More specifically, the proposed method incorporates a new multiple imputation algorithm based on matrix completion (Multiple Accelerated Inexact Soft-Impute), a more stable and accurate new randomized lasso method (Hybrid Random Lasso), and a consistent method to integrate a variable selection method with multiple imputation. Compared to existing methodologies, the proposed approach offers greater accuracy and consistency through mechanisms that enhance robustness against different missing data patterns and sampling variations. The method is applied to analyze the Asian American minority subgroup in the 2017 National Youth Risk Behavior Survey, where key risk factors related to the intention for suicide among Asian Americans are studied. The proposed method demonstrates enhanced accuracy, consistency, and efficiency in variable selection and prediction through simulations and real data analyses in various regression and classification settings.