B0991
Title: Design based incomplete U-statistics
Authors: Xiangshun Kong - Beijing Institute of Technology (China)
Wei Zheng - University of Tennessee (United States) [presenting]
Abstract: U-statistics are widely used in the fields of economy, machine learning and statistics. While it enjoys very desirable statistical properties, it also admits an obvious drawback: the computation easily becomes impractical as the data size $n$ increases. Particularly, the number of combinations, say $m$, that a U-statistic of order $d$ has to evaluate is in the order of $O(n^d)$. Many efforts have been made to approximate the original U-statistic by a small subset of the combinations in history. Such an approximation was coined as an incomplete U-statistic. To the best of our knowledge, all existing methods require $m$ to grow at least faster than $n$, albeit much slower than $n^d$, in order for the corresponding incomplete U-statistic to be asymptotically efficient in the sense of mean squared error. We introduce a new type of incomplete U-statistics, which can be asymptotically efficient even when $m$ grows slower than $n$. In some cases, $m$ is only required to grow faster than $\sqrt{n}$. Both theoretical and empirical results show significant improvements on the statistical efficiency by the new incomplete U-statistic.