COMPSTAT 2024: Start Registration
View Submission - COMPSTAT2024
A0300
Title: On Lasso Poisson regression for categorical variables Authors:  Mariko Yamamura - Hiroshima University (Japan) [presenting]
Mineaki Ohishi - Tohoku University (Japan)
Hirokazu Yanagihara - Hiroshima University (Japan)
Abstract: One of the main advantages of the Lasso is that it provides estimation results with zero coefficients. This means that explanatory variables with estimated coefficients of zero are not included in the model. The case to be considered is that of a categorical explanatory variable. When estimation using Lasso is performed on a categorical variable, some of the multiple categories in the categorical variable will be estimated as zero. This does not mean that the categories estimated as zero are not included in the model but rather that they are chosen as a baseline for the categorical variables. Since the estimates of the coefficients for each category of a categorical variable and the interpretation of these estimates depend on the baseline, it is crucial to identify the baseline category. The aim is to identify which categories are estimated to be zero. The objective function is a Lasso-Poisson regression in which all the explanatory variables are categorical. Theoretical clarifications and numerical experiments show that the baseline corresponds to a category with a coefficient equal to the weighted median of the coefficients of the categorical variable.