CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1370
Title: Measuring dependence between events Authors:  Jan-Lukas Wermuth - Goethe University Frankfurt (Germany) [presenting]
Marc-Oliver Pohle - Heidelberg Institute for Theoretical Studies (Germany)
Timo Dimitriadis - Heidelberg University (Germany)
Abstract: Measuring dependence between two events, or equivalently between two binary random variables, is a central problem in statistics. It amounts to expressing the dependence structure inherent in a $2\times 2$ contingency table in a real number between -1 and 1. Countless such dependence measures exist. Surprisingly, there is little theoretical guidance on how these measures compare and their advantages and shortcomings. Thus, practitioners might be overwhelmed by the problem of choosing a suitable dependence measure. A set of natural, desirable properties is provided, and a dependence measure proper is called if it fulfils them. Tetrachoric correlation and Yule's Q belong to this class, as well as the little-known Cole coefficient. The most widely used measures, the phi coefficient and all contingency coefficients, are improper. They have substantial attainability problems. That is, even under perfect dependence, they can be very far away from -1 and 1. From the class of proper measures, using Yule's Q and Cole's coefficient is recommended and statistical inference is discussed for them. In a case study on drug consumption, it is demonstrated that misleading conclusions may arise from the use of improper dependence measures.