CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0475
Title: Improving adverse drug event prediction using biochemical features extracted with ChemBERTa Authors:  Pietro Belloni - University of Padua (Italy) [presenting]
Abstract: Drug side effects are a major cause of morbidity and mortality around the world. Post-marketing surveillance of drug side effects plays a key role in medical product safety. Typically, surveillance is based on the disproportionality analysis of spontaneous reporting system databases, but their voluntary nature causes multiple biases that induce a limited predictive performance of statistical models. Alternative data sources can help overcome this limitation. Data is used on the biochemical structure of the drugs and spontaneous pharmacovigilance data to obtain better performances. To represent the chemical structure of drug active ingredients, MACCS vectors and SMILES strings are used. The former is used as a set of latent binary features to predict the presence of a latent adverse event. The latter is used to derive an embedding space using a BERT-like transformer model (ChemBERTa). The predictive power of those two sets of latent features are compared and the ChemBERTa embedding space is found to give higher performance. The features obtained from the embedding space are then combined with data from the FAERS spontaneous database to predict the presence of an adverse event with a performance equal to or better than the usual disproportionality models. Since statistical models used in disproportionality analysis are limited by the spontaneous nature of the data, the use of an endogenous data source reduces the bias and leads to better results.