Title: Estimating distribution functions using double sampling designs
Authors: Ori Davidov - University of Haifa (Israel) [presenting]
Abstract: In some situations it is either infeasible or too expensive to collect data on the true outcome of interest from all sampling units. If an easy to collect proxy for the true outcome exists, then a double sampling design may be beneficial. In such situations there typically exists a large primary sample on which the proxy data is available for all sampling units. On a subsample of units from the primary sample the true outcome is ascertained. This subsample is commonly referred to as the validations sample. The full data is then used to estimate the parameter of interest. We propose two methods for empirically estimating a distribution function, and consequently its functionals, under double sampling. The first method is completely nonparametric and the second method is semiparametric copula based. Theoretical properties of the proposed estimators are investigated and simulation experiments presented. Extensions to multivariate distributions, conditional distributions and censored data and high dimensional data are discussed. The methodology is illustrated using a real data example.