Traditional curriculum learning proceeds from easy to hard samples, yet defining a reliable notion of difficulty remains elusive. Prior work has used submodular functions to induce difficulty scores in curriculum learning. We reinterpret adaptive subset selection and formulate it as a multi-armed bandit problem, where each arm corresponds to a submodular function guiding sample selection. We introduce ONLINESUBMOD, a novel online greedy policy that optimizes a utility-driven reward and provably achieves no-regret performance under various sampling regimes. Empirically, ONLINESUBMOD outperforms both traditional curriculum learning and bi-level optimization approaches across vision and language datasets, showing superior accuracy-efficiency tradeoffs. More broadly, we show that validationdriven reward metrics offer a principled way to guide the curriculum schedule.
翻译:传统课程学习遵循从易到难的样本顺序,然而定义可靠的难度度量仍具挑战性。先前研究采用子模函数在课程学习中构建难度评分。本文重新阐释自适应子集选择问题,并将其建模为多臂老虎机问题,其中每个臂对应一个用于指导样本选择的子模函数。我们提出ONLINESUBMOD——一种新颖的在线贪心策略,通过优化效用驱动的奖励函数,在不同采样机制下可证明实现无遗憾性能。实验表明,在视觉与语言数据集中,ONLINESUBMOD在准确率-效率权衡方面均优于传统课程学习与双层优化方法。更广泛而言,我们证明基于验证集的奖励指标为课程进度规划提供了理论指导框架。