A0660
Title: On variables selection Type-I and type-II error tradeoff for high dimensional logistic regression
Authors: Jing Zhou - KU Leuven (Belgium) [presenting]
Gerda Claeskens - KU Leuven (Belgium)
Abstract: In recent years, controlling false discovery rate (FDR), also known as type-I error, has gradually attracted attention to improving the reproducibility of variable selection. We focus on the variable selection problem for $l_1$-regularized logistic regression with $p$ variables and $n$ samples. In addition, we assume $n$, $p$ follow a linear growth rate including both $n>p$ and $n\leq p$ cases. Since the $l_1$-regularizer performs variable selection by nature, we show that the corresponding selection type-I and type-II errors satisfy a tradeoff. This tradeoff is characterized asymptotically by describing type-I error rate (FDR) as a function of 1 - Type-II error rate (power) using a system of equations with six parameters. Further, we propose two applications of this tradeoff curve: (1) a sample size calculation procedure to achieve certain power under prespecified FDR level using the FDR-power tradeoff; (2) FDR level calibration for variable selection taking power into consideration. Similar asymptotic analysis for the model-X knockoff, which provides FDR controlled selection, is also investigated. We illustrate the type-I and type-II error tradeoff analysis using simulated and real data.