A1209
Title: Contradictory conclusions in classification metrics: A study of paired design
Authors: Kazuya Okada - Keio University (Japan) [presenting]
Kenichi Hayashi - Keio University (Japan)
Abstract: The attempt is to quantify the impact of evaluation metric selection on conclusions about the performance of classification models and to provide guidelines for more appropriate choices. In the evaluation of classification models, the choice of appropriate metrics is critical, especially in paired design, where two models are assessed on the same dataset and directly compared. A notable challenge arises when the apparent superiority of classification models reverses depending on the metric used. This phenomenon is explored from the perspective of statistical inference. First, the necessary and sufficient conditions are established for the phenomenon to occur at the population level. Next, the sample size requirements are analyzed to detect the phenomenon using the asymptotic distribution of differences in evaluation metrics. The analysis further extends to multi-class classification by applying generalized F scores. Key findings from numerical studies and their implications are presented.