EcoSta 2023: Start Registration
View Submission - EcoSta2023
A1160
Title: Approaches for handling missing values and their impacts on biological inferences: A molecular rate case study Authors:  Zeny Feng - University of Guelph (Canada) [presenting]
Jacqueline May - University of Guelph (Canada)
Sarah Adamovicz - University of Geulph (Canada)
Abstract: The association between the variation of molecular evolutionary rates and species traits is a prevalent pattern across the Tree of Life. However, analyses that aim to identify such trait-rate associations are often limited in scope due to the missing values in trait data. A common practice of using a complete-case analysis by removing species with missing values will reduce the sample size and analysis power. In the study of the correlates between the molecular rates of cytochrome c oxidase subunit I (COI) and traits of ray-finned fishes, using the complete-case data, the sample size is reduced to 20\% of the original dataset. Missing data imputation offers an alternative that helps to retain sample size, but its accuracy is subject to the choice of imputation methods. The impact of imputation on biological inferences remains largely unexplored, with much focus on imputation accuracy using simulated datasets. Here, we propose a real data-based simulation strategy to select the best-suited method to impute the missing values in the fish trait data. Phylogeny information of multiple nuclear genes will also be used for imputing the missing trait values. Among datasets resulting from different missing data handling approaches, their resulting distributions are compared for each trait. The trait-rate association analysis will also be performed using these datasets. Results will be compared to assess their impacts on the significance level of the trait-rate association.