Title: Regression with high-dimensional categorical data using nonconvex penalties
Authors: Benjamin G Stokell - University of Cambridge (United Kingdom) [presenting]
Ryan J Tibshirani - Carnegie Mellon University (United States)
Rajen D Shah - University of Cambridge (United Kingdom)
Abstract: Categorical data arise in a number of application areas, often with large numbers of levels. We propose a method for estimation in linear models with such covariates. Our method is called `SCOPE', standing for Sparse Concave Ordering and Penalisation Estimator. Within each categorical variable, coefficients are ordered and their adjacent differences penalised by a concave function. It can quickly be computed exactly using a dynamic programming algorithm, exploiting the separable structure of the optimisation objective. We study its theoretical properties and give conditions under which the oracle property holds. This approach can also be used to fit logistic regression models.