A1049
Title: Kernel regime of deep neural networks: Insights and limitations
Authors: Mariia Seleznova - Ludwig Maximilian University of Munich (Germany) [presenting]
Abstract: Training dynamics of non-linear deep neural networks (DNNs) are famously challenging to analyze, so current theory heavily relies on simplifications. In particular, DNN dynamics simplify dramatically in the infinite-width limit, entering the so-called "kernel regime" under certain conditions. In this regime, the dynamics are linearized around the initialization and governed by a deterministic and constant neural tangent kernel (NTK). This allows for a theoretical treatment of optimization and generalization using the NTK - an approach adopted in many recent studies. Given that modern DNNs are typically overparameterized, the infinite-width limit appears to offer a promising framework for these models. However, several limitations of this approach have been identified, and it is examined whether the kernel regime indeed provides a good approximation for the behavior of deep fully connected networks. The results reveal that the depth-to-width ratio and the initialization distribution play a critical role. In particular, very deep networks are generally not in the kernel regime at the beginning of training. A new approach is also proposed to study the dynamics using the "kernel regime at the end of training", which enables the prediction of the neural collapse (NC) phenomenon.