A0425
Title: Differentially private synthetic data without training
Authors: Zinan Lin - Microsoft Research (United States) [presenting]
Abstract: Generating differentially private (DP) synthetic data that closely resembles original data while preserving user privacy is a scalable solution to address privacy concerns in today's data-driven world. Private evolution (PE) is introduced, a new training-free framework for DP synthetic data generation, which contrasts with existing approaches that rely on training DP generative models. PE treats foundation models as blackboxes and only utilizes their inference APIs. It is demonstrated that across both images and text, PE: (1) could match or even outperform prior state-of-the-art (SoTA) methods in the fidelity-privacy trade-off without any model training in some cases; (2) enables the use of advanced open-source models (e.g., Mixtral) and API-based models (e.g., GPT-3.5), where previous SoTA approaches are inapplicable; and (3) is more computationally efficient than prior SoTA methods. Additionally, recent extensions of PE are discussed, including the integration of non-neural-network data synthesis tools, fusion of knowledge from multiple models for DP data synthesis, and applications in federated learning. The hope is that PE unlocks the full potential of foundation models and other data synthesis tools in privacy-preserving machine learning and accelerates the adoption of DP synthetic data across industries.