A1008
Title: A semiparametric model approach to data integration under missing not at random
Authors: Danhyang Lee - Southern Methodist University (United States) [presenting]
Jae Kwang Kim - Iowa State University (United States)
Abstract: While probability sampling has long been the foundation of valid population inference, practical challenges such as rising costs and declining response rates have increased the use of non-probability samples. However, these alternative data sources are susceptible to significant selection bias. Data integration, which combines a non-probability sample with a reference probability sample, offers a potential solution. Most existing methods rely on the strong assumption of ignorable selection, known as missing at random. Although recent approaches have been developed for non-ignorable selection mechanisms, they typically depend on restrictive parametric models for the selection process. A unified data integration framework that accommodates both ignorable and non-ignorable selection mechanisms is proposed. An estimator is introduced for finite-population means developed within an empirical-likelihood framework. To mitigate the risk of model misspecification, the approach models the non-probability sample selection mechanism using a flexible semiparametric propensity-score specification. The resulting estimators combine calibration weighting for enhanced efficiency with the semiparametric propensity scores to reduce selection bias. Furthermore, closed-form variance expressions are derived, eliminating the need for computationally intensive replication techniques.