CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0408
Title: ChatGPT is people: Comparing synthetic and human-made text across social dimensions Authors:  AJ Alvero - University of Florida (United States) [presenting]
Abstract: The current surge in large language models (LLMs) has been driven by advancements in machine learning and the accessibility of digitized text data. These developments are explored through an investigation of the stylistic characteristics of synthetic text compared to human-written text. Specifically, past results are considered on a study of college admissions essays, which found strong correlations between essay content (modelled using correlated topic modelling) and style (modelled using the linguistic inquiry and word count, or LIWC, approach) with an applicant's household income and test score. LLMs are used to generate essays using identical essay prompts in the 2021 study and compare author-content relationships embedded in AI-generated essays to those detected in human-made essays. Beyond practical implications for social domains like college admissions, the contribution is to the understanding of AI in two significant ways. First, it sheds light on potential demographic and social patterns underlying digitized text production and their downstream impact on LLMs. If LLMs become widely adopted, the study could also provide foresight into demographic patterns in whose text will become more prevalent and inform future studies of AI-generated text.