A0183
Title: Dynamic data selection in large model training
Authors: Bing-Yi Jing - Southern University of Science & Technology (China) [presenting]
Abstract: The training of large models typically requires the use of massive internet-scale data. Data quality is crucial to model performance, making the selection of high-quality samples from such vast datasets a critical issue. To address this, the lifecycle of data is redesigned during the training process from the ground zero, starting with the underlying training framework. However, numerous challenges arise in the large-scale application of dynamic data filtering within current large-model training systems. How to tackle these challenges is further explored.