We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution $\mathcal{D}$. The efficiency of our transformation scales with the inherent complexity of $\mathcal{D}$, running in $\mathrm{poly}(n, (md)^d)$ time for distributions over $\{\pm 1\}^n$ whose pmfs are computed by depth-$d$ decision trees, where $m$ is the sample complexity of the original algorithm. For monotone distributions our transformation uses only samples from $\mathcal{D}$, and for general ones it uses subcube conditioning samples. A key technical ingredient is an algorithm which, given the aforementioned access to $\mathcal{D}$, produces an optimal decision tree decomposition of $\mathcal{D}$: an approximation of $\mathcal{D}$ as a mixture of uniform distributions over disjoint subcubes. With this decomposition in hand, we run the uniform-distribution learner on each subcube and combine the hypotheses using the decision tree. This algorithmic decomposition lemma also yields new algorithms for learning decision tree distributions with runtimes that exponentially improve on the prior state of the art -- results of independent interest in distribution learning.
翻译:本文提出了一种黑盒算法转换方法,将任何在统一分布下工作的PAC学习算法,转换成适用于任意未知分布$\mathcal{D}$的学习算法。我们的转换效率随着$\mathcal{D}$的内在复杂性而变化,并且能够在$\{\pm 1\}^n$上的决策树深度为$d$时,在不同的子立方体上运行均匀分布学习器。该算法的时间复杂度为$\mathrm{poly}(n, (md)^d)$,其中$m$是原始算法的样本复杂度。对于单调分布,我们的转换只使用$\mathcal{D}$的样本,对于一般分布,我们采用子立方体条件采样。本文的关键技术是一种算法,该算法通过访问$\mathcal{D}$,生成一个最优的决策树分解:将$\mathcal{D}$近似表示为不相交子立方体上均匀分布的混合物。有了这个分解,我们对每个子立方体使用统一分布学习器,并使用决策树将假设组合起来。该算法的分解引理还产生了新的学习决策树分布的算法,其运行时间指数级提高,这些结果也是分布学习中独立关注的问题。