View Submission

A0429

Title: A unified framework for understanding and quantifying model privacy in adversarial machine learning Authors: Jie Ding - University of Minnesota (United States) [presenting]
Abstract: The security of machine learning models against adversarial attacks has become critical in modern application scenarios, such as machine-learning-as-a-service and collaborative learning. Model stealing attacks, which aim to reverse-engineer a learned model from a limited number of query-response interactions, pose a significant threat. These attacks can potentially steal proprietary models at a fraction of the original training cost. While numerous attack and defence strategies have been proposed with empirical success, most existing works are heuristic, limited in evaluation metrics, and imprecise in characterizing loss and gain. We introduce a unified conceptual framework called Model Privacy, designed to understand and quantify model stealing attacks and defence. Model Privacy captures the fundamental tradeoffs between the usability and vulnerability of a learned model's functionality. Leveraging this framework, fundamental limits are established on privacy-utility tradeoffs, and their implications are discussed. It is demonstrated that a model owner can achieve a minor utility loss by employing non-IID perturbations while obtaining a significantly larger privacy gain, a desirable property unattainable in independent data regimes. Lastly, extensive experiments are presented to corroborate the proposed framework and its effectiveness.