CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A0779
Title: Injury prediction in soccer with conventional statistical approaches and machine learning models Authors:  Alexander Gerharz - TU Dortmund University (Germany)
Andreas Groll - Technical University Dortmund (Germany)
Mathias Kolodziej - Borussia Dortmund (Germany)
Ina-Marie Berendes - TU Dortmund University (Germany) [presenting]
Abstract: Injuries in professional soccer happen frequently and are problematic for players and their clubs. The prevention of injuries can be addressed from a statistical perspective to identify connections. One possibility is modeling the binary injury status of players. The approaches for injury modeling in young professional soccer players compared here are Lasso regularized logistic regression, naive Bayes, linear discriminant analysis, $k$-nearest neighbors, classification trees, random forests, XGBoost, and support vector machines. They are employed to predict the injury probability and status of players. A parallel cross-validated procedure and several quality measures are used to compare the different methods. A post-Lasso logistic regression model with a decreased penalty emerges as the overall best model with a sensitivity of 0.773, a specificity of 0.529, an AUC of 0.672, an accuracy of 0.625, a predictive likelihood of 0.593, and a Brier score of 0.228. It contains three features relevant for injury prediction: the players' postural control sway under static conditions, the concentric knee extension torque, and the transversal plane moment of the hip during a single-leg drop landing task. An XGBoost model reaches a slightly higher accuracy of 0.661 but doesn't match the Lasso models performance regarding the other measures.