CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1100
Title: Batch effect correction in microRNA-seq data for survival risk prediction Authors:  Andy Ni - Ohio State University (United States) [presenting]
Li-Xuan Qin - Memorial Sloan Kettering Cancer Center (United States)
Abstract: Survival risk prediction is an important task in clinical research. RNA sequencing (RNA-seq) has been a useful tool for survival prediction based on patients' gene expression profiles. Unfortunately, RNA-seq data is often contaminated with batch effects arising from non-uniform experimental handling. Recently, BatMan (BATch MitigAtion via stratificatioN) was developed for batch effect correction for survival prediction using microarray data. BatMan is extended to RNA-seq data and its performance is evaluated by real-world data-based simulations. The real-world data are two microRNA sequencing datasets from 27 myxofibrosarcoma patients from Memorial Sloan Kettering Cancer Center, one with batch effects and the other without. To overcome the small sample sizes in the original datasets, generative deep learning is employed to augment the datasets while preserving their data features. Using the augmented datasets, the performance of BatMan is assessed in comparison with ComBat-seq, a popular batch correction method, each used either alone or in conjunction with data normalization, in a re-sampling-based simulation study. It is shown that (1) BatMan performs better than or as well as ComBat-seq, (2) their performance is worsened by the addition of data normalization, and (3) batch-outcome association negatively impacts survival prediction. BatMan is further evaluated using microRNA-seq data for carcinoma cancer from the Cancer Genome Atlas, and similar findings are obtained.