Title: Important variable assessment in modeling transport accident patterns using random forest and bagging
Authors: Camino Gonzalez - Universidad Politécnica de Madrid (Spain) (Spain) [presenting]
Blanca Arenas - Technical University of Madrid (Spain)
Belen Jimenez - Technical University of Madrid (Spain)
Abstract: Phenomenology related to traffic accident is really complex: many factors and variables involved, and usually, data recorded from police reports are heterogeneous and contain missing values. Besides relationships between variables may be strongly non linear, involve some high order interactions and quantitative variables are far for being normally distributed. Under these conditions prediction is very difficult, and the commonly used statistical modeling techniques sometimes are not enough efficient to display meaningfully the underlying accident pattern. Random Forest and Bagging address both exploring and modeling complex data base and play an important role to discover hidden relation between variables, also complemented with high prediction accuracy. The data base used includes transport accidents in the Spanish interurban roads from 2010 to 2012 (90000 records, 60 variables each). Predictor variables comprise functional road type, accident scenario characteristics, calendar variables and traffic information. The fitted models are used to define patterns associated to run-off-road and frontal collisions and also to create parsimonious models with high prediction accuracy. Additionally, several computational experiments have been conducted to perform sensitivity analysis on tuning parameters of algorithms (implemented in R and Guide) and to discuss on the appropriateness and effectiveness of the importance metrics.