B1778
Title: Statistical inference for categorical covariates in high-dimensional logistic regression
Authors: Lea Kaufmann - RWTH Aachen University (Germany) [presenting]
Maria Kateri - RWTH Aachen University (Germany)
Abstract: The presence of high-dimensional problems reinforces the need for interpretable sparse models. In penalized logistic regression, model selection and coefficient estimation are performed at once, choosing a penalty function adjusted to the application context. In the presence of categorical covariates, the model selection process not only includes factor selection but also a fusion of their levels having a non-distinguishable influence on the response. A new method is introduced, called $L_0$-Fused Group Lasso ($L_0$-FGL), performing simultaneously factor selection through a group lasso type penalty and levels fusion through a $L_0$ penalty on the differences of one-factor coefficients. Showing that the $L_0$-FGL estimator satisfies convenient theoretical properties, it additionally strives for statistical inference. Thus, a two-stage $L_0$-FGL method is obtained which includes both, regularization (step 1) and testing (step 2). In particular, the well-known sample splitting approach is transferred to the technique including both factor selection and levels fusion in step 1, where the latter differentiates from the existing approaches. Applying a likelihood ratio test in step 2, asymptotic error control procedures are investigated for two-stage $L_0$-FGL, especially taking care of screening properties for fusion. Finally, an extension is provided to the case of multiple sample splitting.