Simultaneous machine translation (SimulMT) presents a challenging trade-off between translation quality and latency. Recent studies have shown that LLMs can achieve good performance in SimulMT tasks. However, this often comes at the expense of high inference cost and latency. In this paper, we propose a conversational SimulMT framework to enhance the inference efficiency of LLM-based SimulMT through multi-turn-dialogue-based decoding. Our experiments with Llama2-7b-chat on two SimulMT benchmarks demonstrate the superiority of LLM in translation quality while achieving comparable computational latency to specialized SimulMT models.
翻译:同步机器翻译(SimulMT)在翻译质量与延迟之间呈现一种具有挑战性的权衡关系。近期研究表明,大语言模型(LLMs)在同步翻译任务中能够取得良好性能,但这通常以高推理成本和延迟为代价。本文提出一种对话式同步翻译框架,通过基于多轮对话的解码机制,提升基于大语言模型的同步翻译的推理效率。我们在两个同步翻译基准测试中使用Llama2-7b-chat模型进行实验,结果表明大语言模型在翻译质量上具有优越性,同时其计算延迟与专用同步翻译模型相当。