CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A1642
Title: Rank-based strategies for clustering distributions of pairwise scores Authors:  Christopher Saunders - South Dakota State Univerisity (United States) [presenting]
Janean Hanka - South Dakota State University (United States)
Clarissa Giefer - South Dakota State Univerisity (United States)
Abstract: A frequently encountered problem in few-shot learning and forensic source identification is to assign a query object to a class of objects with respect to a score (or sometimes a metric) function based only on a set of pairwise similarities between the objects. In these problems, a metric distinguishes between same-and different-class comparisons. This metric is then used to construct a one nearest neighbor classifier. Unfortunately, a more sophisticated class of models or classifiers is impractical because a few number of observations per class limits the ability to properly estimate the induced joint distribution of a set of scores. A potential solution for this limitation would be to cluster classes of objects if their within-class distributions with respect to the learned metric are sufficiently similar. Strategies for pooling within-class comparisons that have the same within-class distributions are developed. When considering the set of within-class comparisons, this is a U-process with respect to the kernel induced by the learned metric. Goodness-of-fit and rank-based level-alpha tests are proposed for measuring the degree of dissimilarity of these two sets of distributions of scores that will account for the U-process dependency. Finally, strategies for combining the pairs of tests between the classes of objects are developed using local false discovery mixture models to make statements concerning which sets of classes share the same within-class distribution.