A1073
Title: Challenges in estimation of small genetic effects in large-scale population cohorts
Authors: Ava Khamseh - University of Edinburgh (United Kingdom) [presenting]
Sjoerd Beentjes - University of Edinburgh (United Kingdom)
Chris Ponting - University of Edinburgh (United Kingdom)
Olivier Labayle - University of Edinburgh (United Kingdom)
Mark van der Laan - University of California at Berkeley (United States)
Kelsey Tetley-Campbell - University of Edinburgh (United Kingdom)
Joshua Slaughter - University of Edinburgh (United Kingdom)
Abstract: A key aim in genomic medicine is the identification of likely causal DNA variants altering human traits or diseases. Such causal genetic support is crucial for efficient drug discovery because it is estimated to double the success rate of drugs in clinical development. The concrete challenge in this area is that while the data size is very large (100K-1M samples), the effect sizes of individual DNA variants on disease outcomes are expected to be relatively small. This means the slightest degree of bias due to model misspecification can result in biased estimates that may be falsely prioritised for costly experimental verification. The problem is exacerbated when attempting to estimate higher-order DNA variant interactions on disease. At the same time, millions of estimations are performed to probe the genome. This implies that the scalability of the algorithms used is essential. TarGene is introduced as a methodology, pipeline, and software to reproducibly and reliably estimate genetic effect sizes on traits or diseases through various semi-parametric efficient techniques. Although asymptotic properties are equal, these estimators may perform differently in finite samples, necessitating careful comparisons. Open challenges remain, such as (i) accounting for population stratification (beyond PCA) and (ii) correlated causal variants near a variant of interest.