Many scientific applications involve testing theories that are only partially specified. This task often amounts to testing the goodness-of-fit of a candidate distribution while allowing for reasonable deviations from it. The tolerant testing framework provides a systematic way of constructing such tests. Rather than testing the simple null hypothesis that data was drawn from a candidate distribution, a tolerant test assesses whether the data is consistent with any distribution that lies within a given neighborhood of the candidate. As this neighborhood grows, the tolerance to misspecification increases, while the power of the test decreases. In this work, we characterize the information-theoretic trade-off between the size of the neighborhood and the power of the test, in several canonical models. On the one hand, we characterize the optimal trade-off for tolerant testing in the Gaussian sequence model, under deviations measured in both smooth and non-smooth norms. On the other hand, we study nonparametric analogues of this problem in smooth regression and density models. Along the way, we establish the sub-optimality of the classical chi-squared statistic for tolerant testing, and study simple alternative hypothesis tests.
翻译:暂无翻译