CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1659
Title: Over-optimism in gene set analysis: How do the choices made by the researcher influence the results? Authors:  Milena Wuensch - LMU Munich (Germany) [presenting]
Christina Sauer - Ludwig-Maximilians-University Munich (Germany)
Ludwig Christian Hinske - University Hospital of Augsburg (Germany)
Anne-Laure Boulesteix - LMU Munich (Germany)
Abstract: Gene set analysis, a popular approach for analysing high-throughput gene expression data, aims to identify genes that show enriched or depleted expression patterns between two conditions. In addition to the multitude of methods available for this task, the user is typically left with many options when creating the required input and specifying the internal parameters of the chosen method. This flexibility might entice users to produce preferable results using a 'trial-and-error' approach. While seeming intuitive, this can be viewed as 'cherry-picking' and causes an over-optimistic bias, so the results may not be replicable with different datasets. Having attracted a lot of attention in the context of classical hypothesis testing, the aim is to raise awareness of this type of over-optimism in gene set analysis. A hypothetical researcher is mimicked, engaging in the systematic selection of the underlying options, including the choice from a selection of popular methods such as GSEA, to optimise the results for two real gene expression datasets frequently used in benchmarking. The study suggests that this research practice can lead to particularly high variability in the number of gene sets detected as differentially enriched, underlining the risk of selective reporting and over-optimistic results. It is, therefore, concluded by providing practical recommendations to counter over-optimism in research findings produced with gene set analysis.