CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1283
Title: Mildly over-parameterized shallow ReLU networks: Favorable loss landscapes and benign overfitting Authors:  Michael Murray - UCLA (United States) [presenting]
Abstract: Overparameterized neural networks offer state-of-the-art performance across many applications. However, they are poorly understood and seemingly contradict certain aspects of conventional machine learning wisdom. Notably, it is possible to train them using gradient descent despite the non-convexity of the loss and, despite the fact they can approximate rich classes of functions, can interpolate a training sample and still generalize even in the absence of explicit regularization. These observations have motivated a growing body of work that has had success, particularly in the context of very (wide) overparameterized networks, a prominent example being global convergence guarantees when the width is polynomial in the training sample. The more realistic case of moderate (width) overparameterization, in which richer feature learning can occur, is less well understood. The loss landscape of shallow ReLU networks with linearithmic width is discussed, highlighting that most activation regions do not contain bad local minima. Results on overfitting transitions in the case of logarithmic width are presented.