EcoSta 2023: Start Registration
View Submission - EcoSta2023
A1120
Title: Generalized linear models for massive data via doubly-sketching Authors:  Jason Hou-Liu - University of Waterloo (Canada) [presenting]
Ryan Browne - University of Waterloo (Canada)
Abstract: Generalized linear models are a popular analytics tool with interpretable results and broad applicability but require iterative estimation procedures that impose data transfer and computational costs that can be problematic under some infrastructure constraints. A doubly-sketched approximation of the iteratively re-weighted least squares algorithm is proposed to estimate generalized linear model parameters using a sequence of surrogate datasets. The procedure repeatedly sketches to both reduce data transfer costs and reduce data computation costs, yielding wall-clock time savings in approximating the regression coefficients and standard errors. Asymptotic properties of the proposed procedure are shown, with empirical results from simulated and real-world datasets. The efficacy of the proposed method is investigated across a variety of commodity computational infrastructure configurations accessible to practitioners. A highlight is the estimation of a Poisson-log generalized linear model across almost 1.7 billion observations on a personal computer in 25 minutes.