Construction of a general-purpose post-recognition error corrector poses a crucial question: how can we most effectively train a model on a large mixture of domain datasets? The answer would lie in learning dataset-specific features and digesting their knowledge in a single model. Previous methods achieve this by having separate correction language models, resulting in a significant increase in parameters. In this work, we present Mixture-of-Experts as a solution, highlighting that MoEs are much more than a scalability tool. We propose a Multi-Task Correction MoE, where we train the experts to become an ``expert'' of speech-to-text, language-to-text and vision-to-text datasets by learning to route each dataset's tokens to its mapped expert. Experiments on the Open ASR Leaderboard show that we explore a new state-of-the-art performance by achieving an average relative 5.0% WER reduction and substantial improvements in BLEU scores for speech and translation tasks. On zero-shot evaluation, NeKo outperforms GPT-3.5 and Claude-Opus with 15.5% to 27.6% relative WER reduction in the Hyporadise benchmark. NeKo performs competitively on grammar and post-OCR correction as a multi-task model.
翻译:构建通用识别后纠错器面临一个关键问题:如何在大规模混合领域数据集上最有效地训练模型?答案在于学习数据集特定特征,并将其知识整合到单一模型中。先前方法通过使用独立的纠错语言模型实现这一点,导致参数量显著增加。本研究提出采用专家混合模型作为解决方案,强调MoE不仅是可扩展性工具。我们提出一种多任务纠错MoE,通过训练专家学习将每个数据集的令牌路由至其映射的专家,使其成为语音转文本、语言转文本和视觉转文本数据集的“专家”。在Open ASR Leaderboard上的实验表明,我们探索了新的最先进性能,实现了平均相对5.0%的词错误率降低,并在语音和翻译任务的BLEU分数上取得显著提升。在零样本评估中,NeKo在Hyporadise基准测试中以15.5%至27.6%的相对词错误率降低优于GPT-3.5和Claude-Opus。作为多任务模型,NeKo在语法纠错和OCR后纠错任务中表现出竞争力。