EcoSta 2023: Start Registration
View Submission - EcoSta2023
A0695
Title: Convergence rates for softmax gating Gaussian mixtures of experts Authors:  Nhat Pham Minh Ho - University of Texas, Austin (United States) [presenting]
Abstract: Gaussian mixtures of experts with softmax gating functions have been used successfully in numerous applications, including computer vision, speech recognition, system identification, and recently large language models (e.g., Transformer). Despite their popularity, a comprehensive understanding of the behaviours of parameter estimation in these models has remained elusive. The maximum likelihood estimation (MLE) convergence rates are established for these models. The results indicate that the rates of MLE can be (very) slow due to an intrinsic interaction between the expert functions and the softmax gating functions. Finally, based on insights from the theory, a new variant of softmax gating functions that yields much faster convergence rates of parameter estimation in Gaussian mixtures of experts is proposed.