B0622
Title: Inference for big data assisted by small area methods
Authors: Francesco Schirripa Spagnolo - Università di Pisa - Dipartimento di Economia e Management (Italy)
Stefano Marchetti - Dipartimento di Economia e Management, Universita di Pisa (Italy)
Nicola Salvati - University of Pisa (Italy)
Monica Pratesi - University of Pisa (Italy)
Gaia Bertarelli - Sant'Anna school of Advanced Studies (Italy) [presenting]
Abstract: Nowadays, the availability of a huge amount of data produced by a wide range of new technologies, so-called big data, is increasing. Their availability to unprecedented spatial detail represents an opportunity in the context of Small Area Estimation (SAE) to infer some characteristics for very small domains. However, data obtainable from big data sources are often the result of a non-probability sampling process and adjusting for the selection bias is an important practical problem. We propose a novel method of reducing the selection bias associated with the big data source in SAE. The approach is based on data integration and onto the combination of a big data sample and a probability sample. We are interested in the estimation of the population mean of a target variable in each small area of interest. We assume the target variable is available from the big data sources, while auxiliary variables are also available from survey samples. Because of the selection bias, the sample mean of the target variable calculated using the big data is biased and by incorporating the auxiliary information from an external source, we can reduce the selection bias. We develop doubly robust estimators with their MSEs by using SAE models with area-specific effects. These models are implemented to obtain the area estimator from the sample data and the parameters of the propensity score for the big data sample.