B1240
Title: Bayesian learning from synthetic data
Authors: Jack Jewson - Universitat Pompeu Fabra and Barcelona Graduate School of Economics (Spain) [presenting]
Abstract: There is significant growth and interest in the use of synthetic data as an enabler for machine learning in environments where the release of real data is restricted due to privacy or availability constraints. However, mechanisms of privacy preservation introduce artefacts in the resulting synthetic data. We use a Bayesian paradigm to characterise the updating of model parameters when learning in these settings, demonstrating that such downstream tasks can be significantly biased and that careful consideration should be given to the synthetic data generating process and learning task at hand. Recent results from general Bayesian updating allow us to propose several bias mitigation strategies inspired by decision theory, robust statistics and privatised likelihood ratios that have general applicability to differentially private synthetic data generative models. Finally, we highlight that even after bias correction significant challenges remain for the usefulness of synthetic private data generators for tasks such as prediction and inference.