Title: Response surface analysis of genomic prediction accuracy values using quality control covariates in soybean
Authors: Reka Howard - University of Nebraska - Lincoln (United States) [presenting]
Abstract: An important tool for selection purposes and to increase yield in plant breeding is genomic prediction. Genomic prediction is a technique where molecular marker information and phenotypic data are used to predict the phenotype of individuals for which only marker data are available. Higher prediction accuracy can be achieved not only by using efficient models but also by using quality molecular data. The steps of a typical quality control of marker data include the elimination of markers with certain level of minor allele frequency (MAF) and missing marker values and the imputation of missing marker values. We evaluated how the prediction accuracy is influenced by the combination of 12 MAF values, 27 different percentages of missing marker values, and 2 imputation techniques. We constructed a response surface of prediction accuracy values as a function of MAF and percentage of missing marker values using soybean data. We found that both the genetic architecture of the trait and the imputation technique affect the prediction accuracy. For the corresponding combinations MAF-percentage of missing values we observed that implementing the random forest imputation increased the number of markers by 2 to 5 times than the simple nave imputation method that is based on the mean allele dosage of the non-missing values at each loci. There is not a unique strategy (combination of the QCs and imputation method) that outperforms the results of the others for all traits.