CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1279
Title: Sketching for logistic regression Authors:  Alexander Munteanu - TU Dortmund (Germany) [presenting]
Simon Omlor - TU Dortmund (Germany)
David Woodruff - Carnegie Mellon University (United States)
Abstract: Recent advances in oblivious linear sketching are reviewed. The toolbox of sketching enables efficient approximation algorithms for linear regression, allowing to compress data streams or distributed data while preserving various symmetric loss functions such as $l_p$-norms, or M-estimators up to little distortions. Going one step further to highly imbalanced and asymmetric functions encountered in generalized linear models, we face impossibility results against compression below linear size. We focus on new sketching techniques to cope with these issues. In particular, for the important logistic regression problem, natural assumptions on the data distribution are captured by a balance parameter m. Our first sketches for logistic regression provably reduce n data points to $poly(m,d,\log(n))$ while preserving a constant factor approximation. Our new and further optimized sketches compress to almost linear size in $m$ and $d$, while improving the approximation factor to a 2-approximation. The bound on m and d roughly matches the best results obtained via subsampling and is close to optimal. For certain parameterizations, our new sketches allow for the first $(1+\epsilon)$-approximation of the problem in a turnstile data stream.