Title: The effect of data aggregation on dispersion estimates in count data models
Authors: Adam Errington - Durham University (United Kingdom)
Jochen Einbeck - Durham University (United Kingdom) [presenting]
David Endesfelder - Bundesamt fuer Strahlenschutz (Germany)
Jonathan Cumming - Durham University (United Kingdom)
Abstract: For the modelling of count data, it is frequently convenient to aggregate the raw data over certain subgroups. Under the Poisson law, count data can be aggregated (over subgroups with the same predictor configuration) without information loss since the mean is the sufficient statistics for the Poisson parameter. This result remains true if the Poisson assumption is relaxed towards Quasi-Poisson, where one can also show that, assuming conditional independence of the raw counts, the dispersion of the aggregated data is the same as that of the raw counts. However, at this stage, problems start to creep in, which appear to have been largely overlooked in the literature. Firstly, it turns out that the variance of dispersion estimates can increase considerably following aggregation. Secondly, and more importantly, one can show through theory and simulation that relatively small deviations from the independence assumption in the raw data (say, the presence of strings of correlated observations) can lead to dramatically shifted dispersion values after aggregation. Notably, what is affected here is the dispersion itself, not just its estimate! The phenomena are illustrated through count-valued biomarkers (dicentric chromosomes, DNA repair proteins) as used in radiation biodosimetry for the calibration of dose-response curves.