CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0320
Title: Empirical Bayes large-scale multiple testing for high-dimensional sparse binary sequences Authors:  Bo Ning - Harvard T.H. Chan Public Health (United States) [presenting]
Abstract: The multiple testing problem is studied for high-dimensional sparse binary sequences, motivated by the crowdsourcing problem in machine learning. The conjugate spike and uniform slab prior is chosen and an empirical Bayes approach is adopted to estimate the weight. It is first shown that the hard thresholding rule derived from this posterior is suboptimal. Consequently, the multiple testing procedure using the local FDR tends to be overly conservative in estimating the false discovery rate (FDR). Two new procedures are then proposed to correct the FDR. Sharp frequentist theoretical results for both procedures are derived, showing that they can effectively control the FDR uniformly for signals under a sparsity assumption. Numerical experiments are then conducted to validate the theory in finite samples. To the best knowledge, the first uniform FDR control result is provided for multiple testing for sparse binary data in the high-dimensional setting.