View Submission

A0446

Title: Generalization analysis of gradient descent for shallow neural networks Authors: Puyu Wang - Hong Kong Baptist University (Hong Kong) [presenting]
Abstract: Recently, significant progress has been made in understanding the generalization of neural networks (NNs) trained by gradient descent (GD) using the algorithmic stability approach. However, most of the existing research has focused on one-hidden-layer NNs and has not addressed the impact of different network scaling. Network scaling corresponds to the normalization of the layers. The previous work is greatly extended by conducting a comprehensive stability and generalization analysis of GD for two-layer and three-layer NNs. For two-layer NNs, our results are established under general network scaling, relaxing previous conditions. In the case of three-layer NNs, our technical contribution lies in demonstrating its nearly co-coercive property by utilizing a novel induction strategy that thoroughly explores the effects of over-parameterization. As a direct application of our general findings, the excess risk rate of $O(1/\sqrt{n})$ is derived for GD in both two-layer and three-layer NNs. This sheds light on sufficient or necessary conditions for under-parameterized and over-parameterized NNs trained by GD to attain the desired risk rate of $O(1/\sqrt{n})$. Additionally, under a low-noise condition, a fast risk rate of $O(1/n)$ is obtained for GD in both two-layer and three-layer NNs.