B0554
Title: Mean estimation with user-level privacy under data heterogeneity
Authors: Rachel Cummings - Columbia University (United States) [presenting]
Abstract: A key challenge for data analysis in the federated setting is that user data is heterogeneous, i.e., it cannot be assumed to be sampled from the same distribution. Further, in practice, different users may possess a vastly different number of samples. We propose a simple model of heterogeneous user data that differs in both distribution and quantity of data, and we provide a method for estimating the population level meanwhile preserving user-level differential privacy. We demonstrate the asymptotic optimality of the estimator within a natural class of private estimators and also prove general lower bounds on the error achievable in our problem. In particular, while the optimal non-private estimator can be shown to be linear, we show that privacy constrains us to use a non-linear estimator.