A0704
Title: Latent class analysis in large observational datasets: What makes a good model?
Authors: Eva Ryan - University of Limerick (Ireland) [presenting]
Bethany Bray - University of Illinois at Chicago (United States)
John Dziak - Pennsylvania State University (United States)
Ailish Hannigan - University of Limerick (Ireland)
Helen Purtill - University of Limerick (Ireland)
Abstract: Latent class analysis (LCA) is a model-based clustering method for categorical data that aims to identify classes (clusters) of an underlying categorical variable. The number of classes is typically chosen using a data-driven approach. However, with large samples, traditional methods can suggest an impractically large number of classes. To aid class number selection, new LCA fit indices based on existing structural equation modelling indices are proposed. How the LCA indices behave for different sample sizes and LCA class structures is investigated using a simulation study. Plots of the calculated fit indices reveal elbows identifying the correct number of classes for all large sample size simulations and some smaller sample simulations. While the proposed fit indices show potential as heuristics for model selection, to investigate causation when applying LCA to large observational datasets the approach cannot be entirely data-driven. A causal inference framework to highlight and mitigate sources of bias when applying LCA to a large observational dataset is also explored. A Directed Acyclic Graph (DAG) is used to examine the implied causal structure in an LCA study of pain development in older Irish adults. Future work will continue to investigate causal relationships in older adult health.