A0591
Title: Unsupervised deep learning of ATAC-seq peaks
Authors: Karin Dorman - Iowa State University (United States) [presenting]
Yudi Zhang - Iowa State University (United States)
Ha Thi Hong Vu - Iowa State University (United States)
Geetu Tuteja - Iowa State University (United States)
Abstract: ATAC-seq (Assay of Transposase Accessible Chromatin sequencing) is widely used to identify open regions in the genome by ``calling peaks'' where sequenced DNA fragments, accessed and cut by a transposase, are enriched. Most unsupervised peak calling methods are based on traditional statistical models and suffer from elevated false positive rates. Newly developed supervised deep learning methods can be successful, but they rely on high-quality labeled data, which can be difficult to obtain. Neither approach considers biological replicates. We propose a novel deep learning method that uses unsupervised contrastive learning to extract shared signals from two or more replicates. Raw coverage data are encoded to obtain low-dimensional embeddings, optimized to minimize a contrastive loss over biological replicates. In addition, the embeddings produce peak predictions, again under a contrastive loss, and decoded to denoised data under an autoencoder loss. We compare our method with the unsupervised methods MACS2 and HMMRATAC on human ATAC-seq data, using labels obtained from related ChIP-seq experiments as a noisy truth. Our method is more precise (fewer false positives) than competing unsupervised methods. It is also a more effective denoiser of low-quality ATAC-seq data than ATACWorks.