View Submission

A0418

Title: Theoretical foundations of scaling Authors: Leena Chennuru Vankadara - Amazon Web Services (Germany) [presenting]
Abstract: Scaling is pivotal to the success of modern machine learning. However, this upscaling also introduces new challenges, such as increased training instability. Given the immense resources required, developing high-confidence scaling hypotheses backed by rigorous theoretical research is crucial. The purpose is to discuss how infinite width theory can be utilized to establish optimal scaling rules across various architectures and learning paradigms. It begins by discussing the scaling behaviour of multilayer perceptrons (MLPs) under sharpness aware minimization, a min-max learning formulation designed to enhance generalization. The analysis extends naturally to other architectures like transformers, ResNets, and CNNs. Additionally, the scaling behavior of structured state space models (SSMs), which have emerged as efficient alternatives to transformers, is discussed. Owing to the unique structure of their transition matrices, SSMs defy conventional scaling analyses and necessitate specialized approaches. The scaling of SSMs is discussed within the standard minimization framework, highlighting the need for and implications of specialized scaling strategies.