EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0681
Title: Using synthetic data to regularize maximum likelihood estimation Authors:  Weihao Li - National University of Singapore (China) [presenting]
Dongming Huang - National University of Singapore (China)
Abstract: To overcome challenges in fitting complex models with small samples, catalytic priors have recently been proposed to stabilize the inference by supplementing observed data with synthetic data generated from simpler models. Based on a catalytic prior, the maximum a posteriori (MAP) estimator is a regularized estimator that maximizes the weighted likelihood of the combined data. This estimator is straightforward to compute, and its numerical performance is superior or comparable to other likelihood-based estimators. Several theoretical aspects are studied regarding the MAP estimator in generalized linear models, with a particular focus on logistic regression. It is first proven that under mild conditions, the MAP estimator exists and is stable against the randomness in synthetic data. The consistency of the MAP estimator is then established when the dimension of covariates diverges slower than the sample size. Furthermore, the convex Gaussian min-max theorem is utilized to characterize the asymptotic behavior of the MAP estimator as the dimension grows linearly with the sample size. These theoretical results clarify the role of the tuning parameters in a catalytic prior and provide insights into practical applications. Numerical studies are provided to confirm the effective approximation of the asymptotic theory in finite samples and to illustrate adjusting inferences based on the theory.