CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1433
Title: Matrix completion in genetic methylation studies: LMCC, a linear model of coregionalization with informative covariates Authors:  Karim Oualkacha - UQAM (Canada) [presenting]
Abstract: DNA methylation is an important epigenetic mark that modulates gene expression through the inhibition of transcriptional proteins binding to DNA. As in many other omics experiments, missing values are an issue, and appropriate imputation techniques are important to avoid sample size reduction and to leverage the information collected optimally. The case is considered where a relatively small number of samples are processed via an expensive high-density whole genome bisulfite sequencing (WGBS) strategy, and a larger number of samples are processed using more affordable low-density technologies. The aim is to impute the data matrix of the low-density methylation data using the high-density information provided by the WGBS samples. A linear model of coregionalization is proposed to predict missing values based on observed values and informative covariates. At each genomics position, it is assumed that the methylation vector of all samples is linked to the set of fixed factors covariates and a set of latent factors. The functional nature of the data and the spatial correlation are exploited across positions by assuming Gaussian processes on both fixed and latent coefficient vectors. The simulations show that the use of covariates can significantly improve imputation accuracy. Finally, the proposed method is applied to complete a matrix of DNA methylation containing 15 rows of samples and $10^6$ column sites.