CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1976
Title: deepBreaks: A machine learning tool for identifying and prioritizing genotype-phenotype associations Authors:  Ali Rahnavard - The George Washington University (United States) [presenting]
Abstract: Sequence data, such as nucleotides or amino acids, play a crucial role in advancing our understanding of biology. However, investigating and analyzing sequencing data and genotype-phenotype associations present several challenges, including non-independent observations, noise components, nonlinearity, colinearity, and high dimensionality. To address these challenges, machine learning (ML) algorithms are well-suited as they can capture nonstructural patterns and genotype-phenotype associations. Yet, there is a lack of user-friendly ML implementations that leverage the unique features of high-volume DNA sequence data. In this context, we introduce deepBreaks, a versatile approach that identifies important positions in sequence data correlating with phenotypic traits. deepBreaks compares the performance of multiple ML algorithms and prioritizes positions based on the best-fit models. It is an open-source software with online documentation available at https://github.com/omicsEye/deepBreaks.