用于生成神经文本的最佳搜索算法 (Best-$k$ Search Algorithm for Neural Text Generation)

Modern natural language generation paradigms require a good decoding strategy to obtain quality sequences out of the model. Beam search yields high-quality but low diversity outputs; stochastic approaches suffer from high variance and sometimes low quality, but the outputs tend to be more natural and creative. In this work, we propose a deterministic search algorithm balancing both quality and diversity. We first investigate the vanilla best-first search (BFS) algorithm and then propose the Best-$k$ Search algorithm. Inspired by BFS, we greedily expand the top $k$ nodes, instead of only the first node, to boost efficiency and diversity. Upweighting recently discovered nodes accompanied by heap pruning ensures the completeness of the search procedure. Experiments on four NLG tasks, including question generation, commonsense generation, text summarization, and translation, show that best-$k$ search yields more diverse and natural outputs compared to strong baselines, while our approach maintains high text quality. The proposed algorithm is parameter-free, lightweight, efficient, and easy to use.

翻译：现代自然语言生成模式需要良好的解码策略,才能从模型中获得质量序列。光束搜索产生高质量但多样性低的产出; 随机方法存在差异性高,有时质量低,但产出往往比较自然和有创意。在这项工作中,我们提议了一种确定性搜索算法,平衡质量和多样性。我们首先调查香草第一搜索算法,然后提出最佳搜索算法。在BFS的启发下,我们贪婪地扩大顶级的 $k 节点,而不是第一个节点,以提高效率和多样性。增加最近发现的节点,加上粗略处理,确保了搜索程序的完整性。在四个NLG任务上进行的实验,包括问题生成、常识生成、文本合成和翻译,显示最佳-k$搜索产生比强的基线更多样化和自然产出,而我们的方法保持高的文本质量。提议的算法是无参数、轻重、高效和容易使用的。