EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0924
Title: A distribution-free mixed-integer optimization approach to hierarchical modelling of clustered and longitudinal data Authors:  Madhav Sankaranarayanan - Harvard University (United States)
Intekhab Hossain - Harvard University (United States)
Tom Chen - Harvard Pilgrim Health Care and Harvard Medical School (United States) [presenting]
Abstract: Recent advancements in mixed integer optimization (MIO) algorithms and hardware enhancements have led to significant speedups in resolving MIO problems. These strategies have been utilized for optimal subset selection, specifically for choosing $k$ features out of $p$ in linear regression given $n$ observations. The method is broadened to facilitate cluster-aware regression, where selection aims to choose $\lambda$ out of $K$ clusters in a linear mixed effects (LMM) model with $n_k$ observations for each cluster. Through comprehensive testing on a multitude of synthetic and real datasets, the method efficiently solves problems within minutes. Through numerical experiments, it is also shown that the MIO approach outperforms both Gaussian- and Laplace-distributed LMMs in terms of generating sparse solutions with high predictive power. Traditional LMMs typically assume that clustering effects are independent of individual features. However, an innovative algorithm is introduced that evaluates cluster effects for new data points, thereby increasing the robustness and precision of this model. The inferential and predictive efficacy of this approach is further illustrated through its application in student scoring and protein expression.