CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A0934
Title: Surprising learning curves: More data can lead to worse performance and worse estimators Authors:  Tom Viering - TU Delft (Netherlands) [presenting]
Abstract: Learning curves plot performance on unseen data (such as risk) versus dataset size used for learning or estimation. It is noted that these curves are essentially different from similarly named curves that plot performance versus epoch. Learning curves can be used to estimate the amount of data needed for learning. Learning theory and regular statistical results would suggest that learning curves always improve with more data; indeed, many generalization bounds and theory indicates that the excess risk should decrease at a rate of 1/n or 1/sqrt(n), where n is the size of the dataset used for learning or estimation. In contrast, the focus is on some surprising learning curves called ill-behaved: learning curves with maxima and minima, and even periodicity, meaning that more data leads to worse performance. This happens even in expectation, even if the probabilistic model is well-specified, can happen for any sample size and can also occur in Bayesian settings. Surprising behaviors, their relation to double descent, and why they do not contradict well-understood theoretical results are discussed. It highlights less understanding about learning curves than perhaps expected and shows the need for more study of basic machine learning and statistics.