CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1391
Title: How to provably generate privacy-preserving synthetic data for the data economy Authors:  Gerhard Wunder - FU Berlin (Germany) [presenting]
Abstract: Synthetic data has been hailed as the silver bullet for privacy-preserving data analysis. If a record is not real, then how could it violate a person's privacy? In addition, deep-learning-based generative models are employed successfully to approximate complex high-dimensional distributions from data and draw realistic samples from this learned distribution. It is often overlooked, though, that generative models are prone to memorizing many details of individual training records and often generate synthetic data that too closely resembles the underlying sensitive training data, hence violating strong privacy regulations as, e.g., encountered in health care. Alternative approaches for privately generating data are explored that make direct use of the inherent stochasticity in generative models. The main idea is to appropriately constrain the continuity modulus of the deep models instead of adding another noise mechanism on top. For this approach, mathematically rigorous privacy guarantees are derived and its effectiveness is illustrated with practical experiments.