A0364
Title: Random measure priors in Bayesian frequency recovery from sketches
Authors: Mario Beraha - Università di Torino (Italy) [presenting]
Abstract: Given a lossy-compressed representation, or sketch, of data with values in a set of symbols, the frequency recovery problem considers estimating the empirical frequency of a new data point. This is a classical problem in computer science, with recent studies applying Bayesian nonparametric (BNPs) to develop learning-augmented versions of the popular count-min sketch (CMS) recovery algorithm. A novel BNP approach is presented to frequency recovery, which is not built from the CMS but relies on a sketch obtained by random hashing. Assuming data to be modelled as random samples from an unknown discrete distribution, endowed with a Poisson-Kingman (PK) prior, given the sketch, the posterior distribution of a symbol's empirical frequency is provided, with estimates being obtained as posterior mean functionals. The BNP approach is further developed to a traits formulation of the frequency recovery problem, not yet considered in the CMS literature, in which data belong to more than one symbol, referred to as traits, and exhibit nonnegative levels of associations with each feature By modelling data as a generalized Indian buffet process, the posterior distribution of the empirical frequency level of a trait is provided, given a sketch obtained by random hashing. Some applications are presented for the Poisson and Bernoulli distribution for the levels of associations.