EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0855
Title: Asynchronous and distributed data augmentation for massive data settings Authors:  Kshitij Khare - University of Florida (United States) [presenting]
Abstract: Data augmentation (DA) algorithms are slow in massive data settings due to multiple passes through the entire data. This problem is addressed by developing a DA extension that exploits asynchronous and distributed computing. The extended DA algorithm is called asynchronous and distributed (AD)DA, with the original DA as its parent. Any ADDA is indexed by a parameter r in (0,1) and starts by dividing the entire data into k disjoint subsets and storing them on k processes. Every iteration of ADDA augments only an r-fraction of the k data subsets with some positive probability and leaves the remaining (1-r)-fraction of the augmented data unchanged. The parameter draws are obtained using the r-fraction of new and (1-r)-fraction of old augmented data. It is shown that the ADDA Markov chain is Harris ergodic with the desired stationary distribution under mild conditions on the parent DA algorithm. ADDA is demonstrated to be significantly faster than its parent for many (k, r) choices in three representative models. The geometric ergodicity of the ADDA Markov chain is also established for all three models, which yields asymptotically valid standard errors for estimates of desired posterior quantities.