View Submission

A0875

Title: Can large language models boost the power of randomized experiments? Authors: Waverly Wei - University of Southern California (United States) [presenting]
Abstract: Randomized experiments are the gold standard for estimating treatment effects in social sciences. However, modern experiments increasingly generate large-scale, high-dimensional, and unstructured data, such as experiment notes, participant narratives, and audio or video transcripts-creating unique methodological challenges for efficient causal effect estimation and inference. Existing causal inference methods, primarily developed for structured and low-dimensional covariates, may not fully utilize the nuanced information contained within these rich, complex data sources. Integrating large language models (LLMs) into the causal inference framework is explored to address this methodological challenge. A novel methodological framework that leverages LLM-generated counterfactual predictions with semiparametric estimators is proposed for treatment effect estimation, achieving estimation efficiency gains under large-scale multimodal data. Through theoretical analyses and case studies, the approach is demonstrated to enhance the statistical power of estimating the treatment effects.