Fine-tuning large language models (LLMs) on chain-of-thought (CoT) data shows that a small amount of high-quality data can outperform massive datasets. Yet, what constitutes "quality" remains ill-defined. Existing reasoning methods rely on indirect heuristics such as problem difficulty or trace length, while instruction-tuning has explored a broader range of automated selection strategies, but rarely in the context of reasoning. We propose to define reasoning data quality using influence functions, which measure the causal effect of individual CoT examples on downstream accuracy, and introduce influence-based pruning, which consistently outperforms perplexity and embedding-based baselines on math reasoning within a model family.
翻译:在思维链(CoT)数据上微调大型语言模型(LLMs)的研究表明,少量高质量数据的效果可超越海量数据集。然而,“质量”的具体定义仍不明确。现有推理方法依赖问题难度或推理轨迹长度等间接启发式指标,而指令微调虽探索了更广泛的自动选择策略,却极少在推理任务背景下应用。我们提出利用影响函数来定义推理数据的质量——该函数可量化单个CoT样本对下游准确率的因果效应,并引入基于影响的剪枝方法。实验表明,在数学推理任务中,该方法在同一模型家族内持续优于基于困惑度与嵌入向量的基线模型。