EcoSta 2024: Start Registration
View Submission - EcoSta 2025
A0670
Title: Predictive modeling with latent variables: A two-stage approach Authors:  Po-Hsien Huang - National Chengchi University (Taiwan) [presenting]
Abstract: In structural equation modeling (SEM) with latent variables, researchers often seek to examine causal relationships among unobserved constructs. With the growing emphasis on predictive modeling, the meaning of prediction is explored in the presence of latent variables. Let $y$ be a $Q$-dimensional observed response vector and $x$ a $P$-dimensional observed feature vector, both considered noisy measurements of latent variables $\eta$ and $\xi$, respectively. Assuming the data-generating process for $\eta$ depends solely on $\xi$, the conditional expectation $\text{E}[y|x]$ provides a meaningful quantity for prediction. In the linear case, the expression for $\text{E}[y|x]$ implies that predicting $y$ via $\text{E}[\xi|x]$ - the regression factor score of $\xi$ - is sufficient, thereby justifying the use of factor score regression even in the presence of interaction effects. This insight motivates a two-stage estimation procedure: the first stage fits measurement models (e.g., factor analysis, latent class analysis, or item response theory) to reduce the dimensionality of the feature vector, and the second stage estimates regression coefficients by minimizing unbiased estimating equations. The proposed framework generalizes traditional factor score regression to a broad class of measurement models for both features and outcomes.