A1100
Title: Two-sample comparison through additive tree models for density ratios
Authors: Naoki Awaya - Waseda University (Japan) [presenting]
Li Ma - University of Chicago (United States)
Yuliang Xu - University of Michigan (United States)
Abstract: The density ratio is an effective summary of the difference between two distributions. The aim is to propose additive tree ensembles for the density ratio, along with efficient algorithms for training these models based on i.i.d. samples from the distributions. A loss function is introduced called the balancing loss under which such models can be trained from both an optimization perspective that parallels tree boosting and from a (generalized) Bayesian perspective that parallels Bayesian additive regression trees (BART). For the former, two boosting algorithms are presented: One based on forward-stagewise fitting and the other on gradient boosting for computing a single estimate for the density ratio function. For the latter, it is shown that due to its resemblance to an exponential family kernel, the new loss can serve as a pseudo-likelihood for which conjugate priors exist, thereby enabling effective generalized Bayesian inference on the density ratio using the backfitting sampler for BART. This allows generalized Bayesian uncertainty quantification on the inferred density ratio, which is critical but often unaddressed in modern applications involving two-sample comparison. The application of the method is demonstrated in a case study involving assessing the quality of generative models for microbiome compositional data.