EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0542
Title: Distributed logistic regression for massive data with rare events Authors:  Xuetong Li - Central University of Finance and Economics (China) [presenting]
Xuening Zhu - Peking University (China)
Hansheng Wang - Peking University (China)
Abstract: Large-scale rare event data are commonly encountered in practice. To tackle the massive rare events data, a novel distributed estimation method is proposed for logistic regression in a distributed system. For a distributed framework, the following two challenges are faced. The first challenge is how to distribute the data. In this regard, two different distribution strategies (i.e., the random strategy and the copy strategy) are investigated. The second challenge is how to select an appropriate type of log-likelihood function so that the best asymptotic efficiency can be achieved. Then, the under-sampled (US) and inverse probability weighted (IPW) types of objective functions are considered. The results suggest that the copy strategy, together with the IPW objective function, is the best solution for distributed logistic regression with rare events. The finite sample performance of the distributed methods is demonstrated by simulation studies and a real-world Swedish Traffic Sign dataset.