CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A1197
Title: Advances in data analysis using aggregated data Authors:  Boris Beranger - University of New South Wales (Australia) [presenting]
Scott Sisson - University of New South Wales (Austria)
Prosha Rahman - University of New South Wales (Australia)
Ahmad Hakiim Jamaluddin - University of New South Wales (Australia)
Abstract: The necessity for faster and more efficient statistical modelling techniques has been motivated by the rise of big and complex data. For example, the huge volume of internet data collected on a daily basis implies that simple statistical models cannot be fitted on a regular computer and sometimes even be stored. One strategy is to reduce the amount of data by aggregating it into summaries and to perform an analysis on the summaries themselves. For a general aggregated function, a likelihood-based approach is proposed to fit statistical models defined at the underlying data level. Theoretical guarantees about those maximum likelihood estimators are established, including consistency results for generic continuous aggregation functions. The important yet (almost) unexplored topic of summary design is then dived into. Focusing on the family of (univariate) random bin histogram aggregation functions and developing a methodology to provide some answers to the burning question: how many bins do we need and where to place them? Some simulation experiments are provided to illustrate the insights drawn from the methodology.