A0993
Title: Statistical inference for large-scale multi-source heterogeneous data
Authors: Li Cai - Zhejiang Gongshang University (China) [presenting]
Abstract: In the era of digital information, people are faced with data that may not only be large-scale but also heterogeneous. The purpose is to study statistical inference for the overall population mean function of large-scale multi-source heterogeneous datasets. By borrowing hierarchical sampling methods and divide-and-conquer techniques, a weighted local linear estimator is proposed for the overall population mean function of multi-source heterogeneous data. Through studying the pointwise convergence properties and extreme value distribution properties of the estimator, accurate simultaneous confidence bands and pointwise confidence intervals are constructed asymptotically for large-scale multi-source heterogeneous data. The proposed methods are applicable not only to scenarios of heterogeneous data but also to scenarios of homogeneous data using divide-and-conquer methods. Numerical simulation studies show that the proposed methods perform well in analyzing both large-scale multi-source heterogeneous data and homogeneous data. As an illustration, the proposed methods are applied to hypothesis testing problems on Beijing multi-site air-quality data and U.S. census data.