A0414
Title: Implicit regularization of deep residual networks towards neural ODEs
Authors: Pierre Marion - Inria (France) [presenting]
Abstract: Residual neural networks are state-of-the-art deep learning models. Their continuous-depth analog, neural ordinary differential equations (ODEs), are also widely used. Despite their success, the link between the discrete and continuous models still lacks a solid mathematical foundation. A convergence result of deep residual networks towards neural ODEs is discussed, for nonlinear networks trained with gradient flow: If the network is initialized as a discretization of a neural ODE, then such a discretization holds throughout training. The result is valid for a finite training time, and also as the training time tends to infinity, provided that the network satisfies a Polyak-Lojasiewicz condition. Importantly, this condition holds for a family of residual networks where the residuals are two-layer perceptrons with an overparameterization in width that is only linear, and implies the convergence of gradient flow to a global minimum. If time allows, consequences will be also discussed in terms of statistical guarantees, namely generalization bounds.