CMStatistics 2016: Start Registration
View Submission - CMStatistics
B1273
Title: Sparse modeling of risk factors for insurance analytics Authors:  Sander Devriendt - KU Leuven (Belgium) [presenting]
Katrien Antonio - University of Amsterdam and KU Leuven (Belgium)
Edward Frees - University of Wisconsin-Madison (United States)
Roel Verbelen - KU Leuven (Belgium)
Abstract: Insurance companies use predictive models for a variety of analytic tasks, including pricing, marketing campaigns, claims handling, fraud detection and reserving. Typically, these predictive models use a selection of categorical, continuous, spatial and multi-level (e.g. car brand and model) covariates to differentiate risks. Insurance companies compete by setting their prices through risk classification or segmentation. The pricing model should not only be competitive, but also interpretable by stakeholders (including the policyholder and the regulator) and easy to implement and maintain in a production environment. That is why current literature on actuarial pricing puts focus on generalized linear models where risk factors are binned (or: categorized) up front, using ad hoc techniques or professional expertise. Relying on the statistical literature on sparse modeling with penalization techniques, we present a strategy which includes variable selection and the binning or grouping of risk factors within the model estimation process. As such, we are able to simultaneously select, estimate and group, in a statistically sound way, any combination of categorical, continuous, spatial and multi-level risk factors. We explain the general framework and show how this method incorporates different adjustable penalties to handle categorical, ordinal, nominal and spatial information. We illustrate the approach with a case-study on a motor third party liability dataset.