A0926
Title: SAFE: A unified framework for the validation of tabular, imaging and longitudinal synthetic data in clinical research
Authors: Gianluca Asti - IRCCS Humanitas (Italy) [presenting]
Elena Zazzetti - IRCCS Humanitas (Italy)
Elisabetta Sauta - IRCCS Humanitas (Italy)
Mattia Delleani - Train (Italy)
Eleonora Iascone - IRCCS Humanitas (Italy)
Alessandro Bruseghini - IRCCS Humanitas (Italy)
Giulia Maggioni - IRCCS Humanitas (Italy)
Luca Lanino - IRCCS Humanitas (Italy)
Alessia Campagna - IRCCS Humanitas (Italy)
Marta Ubezio - IRCCS Humanitas (Italy)
Alessandro Buizza - IRCCS Humanitas (Italy)
Gabriele Todisco - IRCCS Humanitas (Italy)
Cristina Astrid Tentori - IRCCS Humanitas (Italy)
Antonio Russo - IRCCS Humanitas (Italy)
Alessandra Crespi - IRCCS Humanitas (Italy)
Nicole Pinocchio - IRCCS Humanitas (Italy)
Maria Chiara Grondelli - IRCCS Humanitas (Italy)
Alessandro Forcina Barrero - IRCCS Humanitas (Italy)
Viktor Savevski - IRCCS Humanitas (Italy)
Armando Santoro - IRCCS Humanitas (Italy)
Saverio DAmico - IRCCS Humanitas (Italy)
Matteo Giovanni Della Porta - IRCCS Humanitas (Italy)
Abstract: SAFE (Synthetic vAlidation FramEwork) is a statistically grounded, scalable framework designed to rigorously validate synthetic data (SD) across clinical modalities, including structured, temporal, and imaging data. It focuses on three core characteristics: Fidelity, utility, and privacy, through a suite of quantitative metrics tailored to data type and clinical context. For tabular and longitudinal data, SAFE applies RMSE, R, total variation distance (TVD), Kolmogorov-Smirnov tests, SMAPE, and correlation analysis to assess distributional alignment. For imaging, fidelity is evaluated using FID and MS-SSIM. Privacy is quantified via membership inference attacks, while clinical utility is tested through downstream tasks such as survival prediction (L1-penalized Cox models), disease classification (e.g., XGBoost), and synthetic control arm construction. SAFE was validated in two domains: Synthetic bone marrow images for hematologic malignancies and longitudinal breast cancer records. In the hematology use case, synthetic images matched real histopathological features and improved disease classification by 10\% (F1 score) and survival modeling by over 10\% (C-index). In breast cancer, LLMs like Mistral-7B generated high-fidelity, privacy-preserving data that enhanced predictive modeling and successfully emulated clinical trial outcomes. SAFE enables statistically robust integration of SD into clinical research, supporting precision medicine and real-world evidence generation.