CFE-CMStatistics 2025: Start Registration
View Submission - CFE-CMStatistics 2025
A0579
Title: Optimal subsampling for linear models with heteroscedasticity Authors:  Jiayi Zheng - George Mason University (United States) [presenting]
Abstract: The rapid growth of data availability in modern scientific research and computation has led to increasingly large datasets across various fields. Studying linear relationships through regression analysis remains a fundamental task. However, despite advances in computer hardware, preserving and processing massive datasets continues to pose significant challenges. Subsampling methods, which focus on selecting the most informative subset of data based on optimal criteria, have been developed to address this issue, with the information-based optimal subdata selection (IBOSS) algorithm being a notable example. Additionally, heteroskedasticity in datasets presents a growing challenge. Methodologies are discussed for subsampling from big data, with a detailed explanation of two algorithms, weighted IBOSS and approximate nearest neighbor simulated annealing (ANNSA), and their performance in the presence of heteroskedasticity. A case study using real-world data is also included to illustrate their application.