View Submission

A0671

Title: Scalable and efficient statistical inference for big longitudinal data Authors: Ling Zhou - University of Michigan (United States) [presenting]
Peter Song - University of Michigan (United States)
Abstract: The theory of statistical inference along with the strategy of divide-and-conquer for large-scale data analysis has recently attracted considerable interest due to great popularity of the MapReduce programming paradigm in the Apache Hadoop software framework. The central analytic task in the development of statistical inference in the MapReduce paradigm pertains to the method of combining results yielded from separately mapped data batches. One seminal solution based on the confidence distribution has recently been established in the setting of maximum likelihood estimation in the literature. The focus is on a more general inferential methodology based on estimating functions, termed as the Rao-type confidence distribution, of which the maximum likelihood is a special case. This generalization provides a unified framework of statistical inference that allows regression analyses of massive data sets of important types in a parallel and scalable fashion via a distributed file system, including longitudinal data analysis, which cannot be handled using the maximum likelihood method. Four important properties of the proposed method are investigated: computational scalability, statistical optimality, methodological generality, and operational robustness. All these properties of the proposed method are illustrated via numerical examples in both simulation studies and real-world data analyses.