面向高效大语言模型指令调优的重要性感知数据选择方法 (Importance-Aware Data Selection for Efficient LLM Instruction Tuning)

Instruction tuning plays a critical role in enhancing the performance and efficiency of Large Language Models (LLMs). Its success depends not only on the quality of the instruction data but also on the inherent capabilities of the LLM itself. Some studies suggest that even a small amount of high-quality data can achieve instruction fine-tuning results that are on par with, or even exceed, those from using a full-scale dataset. However, rather than focusing solely on calculating data quality scores to evaluate instruction data, there is a growing need to select high-quality data that maximally enhances the performance of instruction tuning for a given LLM. In this paper, we propose the Model Instruction Weakness Value (MIWV) as a novel metric to quantify the importance of instruction data in enhancing model's capabilities. The MIWV metric is derived from the discrepancies in the model's responses when using In-Context Learning (ICL), helping identify the most beneficial data for enhancing instruction tuning performance. Our experimental results demonstrate that selecting only the top 1\% of data based on MIWV can outperform training on the full dataset. Furthermore, this approach extends beyond existing research that focuses on data quality scoring for data selection, offering strong empirical evidence supporting the effectiveness of our proposed method.

翻译：指令调优在提升大语言模型（LLMs）的性能与效率方面发挥着关键作用。其成功不仅取决于指令数据的质量，还与大语言模型自身的内在能力密切相关。已有研究表明，即使仅使用少量高质量数据，也能实现与全量数据集相当甚至更优的指令微调效果。然而，当前研究不应仅局限于通过计算数据质量分数来评估指令数据，更需关注如何为特定大语言模型选择能最大限度提升指令调优性能的高质量数据。本文提出了一种新颖的度量指标——模型指令弱点值（MIWV），用于量化指令数据在增强模型能力方面的重要性。MIWV指标通过模型在上下文学习（ICL）场景下的响应差异推导得出，有助于识别对提升指令调优性能最有益的数据。实验结果表明，仅基于MIWV筛选出的前1%数据，其训练效果即可超越在全量数据集上的训练效果。此外，本方法超越了现有专注于数据质量评分的数据选择研究，为所提方法的有效性提供了有力的实证依据。