EcoSta 2024: Start Registration
View Submission - EcoSta 2025
A0935
Title: FSA: An efficient model selection tool for sparse clinical notes data Authors:  Anna Smith - University of Kentucky (United States) [presenting]
Chris Delcher - University of Kentucky (United States)
Nicholas Anthony - University of Kentucky (United States)
Abstract: High-dimensional, sparse predictors, especially NLP (natural language processing) measures from unstructured text data, pose challenges for traditional model building approaches. Common dimension reduction techniques often obscure domain-specific interpretations and overlook higher-order interactions. The feasible solutions algorithm (FSA) is reviewed, a stochastic search method that identifies a set of statistically optimal, interpretable models rather than a single "best" one. FSA preserves standard effect interpretations and can include higher-order interactions in its search. In the sparse predictor setting, the algorithm's stochastic exploration can waste time evaluating invalid interactions. To address this, interactions below a predefined detection threshold are predetermined and recorded. Additionally, diagnostic statistics are recorded for all explored models, enabling a more efficient stepwise implementation of FSA where rankings of prior models inform exploration probabilities in subsequent iterations, similar to empirical Bayes methods. The FSA extension is demonstrated in a public health application, which uses TF-IDF scores of unstructured clinical notes data from an electronic health records (EHR) database to predict fatal opioid overdoses. The models discovered by FSA reveal important interactions terms that co-occur within a patient's clinical notes that would be difficult to detect otherwise.