CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1141
Title: Learning CUT\&RUN peaks from replicate samples with high duplicate sampling and low signal Authors:  Karin Dorman - Iowa State University (United States) [presenting]
Abstract: CUT\&RUN (Cleavage Under Targets and Release Using Nuclease) is an exciting new method to detect protein binding sites in genomes by calling peaks where sequenced DNA fragments, excised from the genome by a nuclease tethered to the protein of interest, are enriched. The advantage of CUT\&RUN over the more traditional ChIP-seq (Chromatin ImmunoPrecipitation followed by sequencing) is the ability to work with smaller amounts of input DNA in live cells or nuclei and to strengthen the signal of binding relative to the high background common with non-specific immunoprecipitation. The disadvantage is the possibility that PCR (Polymerase Chain Reaction) amplification of the DNA fragments plays an outsized role, leading to the repeated sampling of the same amplified molecule. Indeed, current CUT\&RUN data analysis protocols leave the handling of duplicate molecules entirely up to the user, with valid sounding arguments for retaining and discarding all duplicates. A branching process model is developed for PCR to account for the repeated sampling. The model is combined with flexible spatial models to learn the location and types of peaks throughout the genome reproducibly visible in replicate samples. The method is compared with the existing methods MACS2 and SEACR on previously analyzed CUT\&RUN data, and the method is applied to data from a collaborator who studies the limited numbers of blood stem cells in zebrafish as a model for human blood disease.