CMStatistics 2022: Start Registration
View Submission - CMStatistics
B1435
Title: The projected covariance measure for assumption-lean variable significance testing Authors:  Rajen D Shah - University of Cambridge (United Kingdom) [presenting]
Anton Rask Lundborg - University of Cambridge (United Kingdom)
Ilmun Kim - Yonsei University (Korea, South)
Richard Samworth - University of Cambridge (United Kingdom)
Abstract: Testing the significance of a variable or group of variables $X$ for predicting a response Y given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model and test whether the $X$ regression coefficient is non-zero. However, when the model is misspecified, as will invariably be the case, the test may have poor power, for example, when $X$ is involved in complex interactions, or lead to many false rejections. We study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or boosted trees, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one-half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem.