EcoSta 2024: Start Registration
View Submission - EcoSta 2025
A0977
Title: Optimal information-based subdata selection Authors:  John Stufken - George Mason University (United States) [presenting]
Min Yang - University of Illinois at Chicago (United States)
Ming-Chung Chang - Academia Sinica (Taiwan)
Abstract: Subdata selection is a crucial strategy when the size of a large dataset exceeds available computing resources or when observing the response variable is costly. The challenge is selecting a set of n data points from N available data points, retaining a maximum amount of information. Since this is an NP-hard problem, any solution is an approximation of the optimal solution. For various methods that have been proposed, little is known about the efficiency of selected subdata relative to the optimal solution. Based on continuous optimal design theory, a method is proposed to bridge this gap. A lower and upper bound are obtained for the relative efficiency of any given subdata. A novel algorithm is also developed for subdata selection, its convergence is shown, and its superior performance is demonstrated.