A0867
Title: Subsampling and its advantages for exponential family models
Authors: Subhadra Dasgupta - Ruhr University Bochum (Germany) [presenting]
Holger Dette - Ruhr-Universitaet Bochum (Germany)
Abstract: A novel two-stage subsampling algorithm is proposed based on optimal design principles. In the first stage, a density-based clustering algorithm and a Markov chain Monte Carlo method are used to identify an approximating design space for the predictors from an initial subsample. Next, an optimal approximate design is determined on this design space. Finally, matrix distances, such as the Procrustes, Frobenius, and square-root distance, are used to define the remaining subsample so that its points are closest to the support points of the optimal design. The approach reflects the specific nature of the information matrix as a weighted sum of non-negative definite Fisher information matrices evaluated at the design points and applies to a large class of regression models, including models where the Fisher information is of rank larger than 1. Additionally, the algorithm removes outliers from the subsample, leading to reliable predictions.