EcoSta 2024: Start Registration
View Submission - EcoSta 2025
A0183
Title: Dynamic data selection in large model training Authors:  Bing-Yi Jing - Southern University of Science & Technology (China) [presenting]
Abstract: The training of large models typically requires the use of massive internet-scale data. Data quality is crucial to model performance, making the selection of high-quality samples from such vast datasets a critical issue. To address this, the lifecycle of data is redesigned during the training process from the ground zero, starting with the underlying training framework. However, numerous challenges arise in the large-scale application of dynamic data filtering within current large-model training systems. How to tackle these challenges is further explored.