View Submission

B1755

Title: Using large language models for variable selection in observational healthcare studies Authors: Raluca-Ioana Cobzaru - Massachusetts Institute of Technology (United States) [presenting]
Roy Welsch - Massachusetts Institute of Technology (United States)
Stan Finkelstein - MIT (United States)
Zach Shahn - IBM Research (United States)
Abstract: Recent breakthroughs in large language models (LLMs) such as OpenAI's GPT-3.5 have drawn significant attention to the question of using LLMs for causal inference. Through their ability to summarize large corpora of medical data and even provide interpretable responses, LLMs are showing great promise as a supplemental tool for covariate selection in observational studies, alongside statistical correlation tests and expert knowledge. Still, while LLMs like GPT-3.5 and GPT-4 were shown to achieve competitive performance in establishing pairwise causal relationships between variables, such performance is not robust to the language and structure of the prompting scheme. Moreover, the full potential of these LLMs for the identification of negative controls and instrumental variables has not been explored in the causal literature. A systematic procedure for prompting LLMs (in particular, GPT-3.5/4) is developed to identify negative control and instrumental variable candidates for a given causal effect estimation task. Towards this goal, multiple prompt engineering techniques and retrieval augmented generation (RAG) are combined using specialized causal and biomedical literature to increase the models' accuracy in targeting these variables.