CFE 2019: Start Registration
View Submission - CMStatistics
B1440
Title: Early stopping vs late stopping: Different flavors of SGD Authors:  Nicole Muecke - TU Braunschweig (Germany) [presenting]
Abstract: While stochastic gradient descent (SGD) is a workhorse in machine learning, the learning properties of many variants used in practise are hardly known. We consider non-parametric regression with (strongly) convex objectives and contribute to fill this gap focusing on the effect and interplay of multiple passes, mini-batching and averaging, and in particular tail averaging. An important aspect is choosing in a data-driven way the total number of iterations and the step-size, namely in terms of the localized empirical Rademacher Complexity. The results show how these different flavors of SGD can be combined to achieve optimal learning errors, providing also practical insights.