Dataset distillation (DD) aims to construct compact synthetic datasets that allow models to achieve comparable performance to full-data training while substantially reducing storage and computation. Despite rapid empirical progress, its theoretical foundations remain limited: existing methods (gradient, distribution, trajectory matching) are built on heterogeneous surrogate objectives and optimization assumptions, which makes it difficult to analyze their common principles or provide general guarantees. Moreover, it is still unclear under what conditions distilled data can retain the effectiveness of full datasets when the training configuration, such as optimizer, architecture, or augmentation, changes. To answer these questions, we propose a unified theoretical framework, termed configuration--dynamics--error analysis, which reformulates major DD approaches under a common generalization-error perspective and provides two main results: (i) a scaling law that provides a single-configuration upper bound, characterizing how the error decreases as the distilled sample size increases and explaining the commonly observed performance saturation effect; and (ii) a coverage law showing that the required distilled sample size scales linearly with configuration diversity, with provably matching upper and lower bounds. In addition, our unified analysis reveals that various matching methods are interchangeable surrogates, reducing the same generalization error, clarifying why they can all achieve dataset distillation and providing guidance on how surrogate choices affect sample efficiency and robustness. Experiments across diverse methods and configurations empirically confirm the derived laws, advancing a theoretical foundation for DD and enabling theory-driven design of compact, configuration-robust dataset distillation.
翻译:数据集蒸馏旨在构建紧凑的合成数据集,使模型在显著降低存储和计算成本的同时,达到与全数据训练相当的性能。尽管实证研究进展迅速,其理论基础仍显不足:现有方法(梯度匹配、分布匹配、轨迹匹配)基于异构的代理目标与优化假设构建,这导致难以分析其共同原理或提供普适性保证。此外,当训练配置(如优化器、架构或数据增强策略)发生变化时,尚不清楚蒸馏数据在何种条件下仍能保持与完整数据集相当的有效性。为回答这些问题,我们提出了一个统一的理论框架——配置-动态-误差分析,该框架将主流数据集蒸馏方法置于统一的泛化误差视角下重新表述,并得出两个主要结论:(i)缩放定律,提供了单一配置下的误差上界,刻画了误差随蒸馏样本量增加而下降的规律,并解释了常见的性能饱和现象;(ii)覆盖定律,表明所需蒸馏样本量与配置多样性呈线性比例关系,并给出了可证明匹配的上界与下界。此外,我们的统一分析揭示了各类匹配方法本质上是可互换的代理目标,均能降低相同的泛化误差,从而阐明了它们均能实现数据集蒸馏的原因,并为代理选择如何影响样本效率与鲁棒性提供了理论指导。跨多种方法与配置的实验结果验证了所推导的定律,推动了数据集蒸馏理论基础的建立,并为设计紧凑、配置鲁棒的数据集蒸馏方法提供了理论驱动的设计依据。