Title: Parallelising Bayesian biostatistical analyses using Yin-Yang sampling
Authors: Alexandra Posekany - University of Technology Vienna (Austria) [presenting]
Sylvia Fruhwirth-Schnatter - WU Vienna (Austria)
Abstract: For decades, Bayesian methods have been more widely applied in biostatical and medical analyses. A part of that success is their ability to integrate new observations with previous knowledge from other studies or experts. However, the drawback of good handling of small samples is that Bayesian models for large studies analysing 100000s of patients from medical registries suffer from the required computational intensity w.r.t. memory usage and need for parallelisation. To get a step closer to performing such large calculations, we developed a methodology which is splits the data into smaller subsets, performing Bayesian inference independently on each and finally merges the separate results into a joint merged result which is comparable to the inference of the complete data set. The method was named Yin-Yang sampling, because it focusses on merging information from two different sources-the yin and the yang sample-by correcting multiple use of the prior information. By appling yin-yang sampling steps, we recover the full samples posterior from any given number of subsamples posteriors with some restrictions. For demonstration, an inference with logistic regression on a data set from the Austrian Stroke registry containing over 100000 patients is shown. This provides a scenario where the full samples inference is impossible on a desktop computer due to lack of memory, while subsample computation plus the Yin-Yang merging algorithm require only minutes for computation.