CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A0268
Title: On data analysis pipelines and modular Bayesian modeling Authors:  Abel Rodriguez - University of Washington (United States) [presenting]
Abstract: The most common approach to implementing data analysis pipelines involves obtaining point estimates from the upstream modules and then treating these as known quantities when working with the downstream ones. This approach is straightforward, but it is likely to underestimate the overall uncertainty associated with any final estimates. An alternative approach involves estimating parameters from the modules jointly using a Bayesian hierarchical model, which has the advantage of propagating upstream uncertainty into the downstream estimates. However, when modules are misspecified, such a joint model can behave in unexpected ways. Furthermore, hierarchical models require the development of ad-hoc computational implementations that can be laborious and computationally expensive. Cut inference modifies the posterior distribution to prevent information flow between certain parameters and provides a third alternative for statistical inference in data analysis pipelines. A unified framework is presented that encompasses two steps, cut and joint inference, in the context of data analysis pipelines with two modules. It also uses two examples to illustrate the tradeoffs associated with these approaches. It is shown that cut inference provides both some level of robustness and ease of implementation for data analysis pipelines at a lower cost in terms of statistical inference.