EcoSta 2021: Start Registration
View Submission - EcoSta2021
A0684
Title: Estimation of the largest mean Gaussian mixture component with population genetic applications Authors:  Andreas Futschik - JKU Linz (Austria) [presenting]
Abstract: In population genetics, the effective population size $Ne$ is a key parameter determining the amount of genetic drift. When using genomic time series data, estimates of $Ne$ often rely on the changes in allele frequency at a large number of SNP positions. If a considerable proportion of the genome is affected by selection, these estimates will be biased, however, as both neutral and selected SNPs contribute. Due to the central limit theorem, estimates of Ne will typically be approximately normally distributed, given they are computed from a sufficient number of SNPs. Their mean and variance will differ, however, between neutral and selected regions. Gaussian mixture models would seem to be a natural approach to model such data. Since the selection strength will differ between selected positions, the number of mixture components may be large, making parameter estimation using standard methods such as the EM algorithm a challenge. We, therefore, propose a completely new approach that estimates only the largest (neutral) mixture component and does not infer the full mixture model. We illustrate its application to neutral Ne estimation in our discussed framework.