CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A1583
Title: Experimental design for modern settings: Stories about text Authors:  Alexander Volfovsky - Duke University (United States) [presenting]
Abstract: Given two texts, the question is which one is more persuasive. Such a comparison only informs about these two texts and does not inform what elements of the text drive the causal mechanism. Since the mechanism is of interest, a tempting design is to show many texts, measure their effects, and use natural language processing to learn what features of the texts should be considered as components of a causal analysis. However, such a black-box approach (e.g. a large language model) provides insufficient control of the causal model and may lead to spurious or nonsensical results. The question of what the treatment is is first addressed when the aim is to experiment with text. Specifically, the necessary (usually unstated) assumptions are outlined to make the text a plausible treatment. A novel experimental design is then developed that allows the researcher to control which elements of the text are being studied. For example, text is generated to study the effect of using intellectually humble language on the persuasiveness of the underlying text. Two major issues with machine learning methods are found for inferring causal effects of text. Transformer models that use learned representations of text as confounders overfit the data, inducing positivity violations. Other estimators that try to correct for text indirectly underfit the data and act like estimators that never even looked at text confounders.