未经监督的德文-下索布语翻译:探索关于低资源语言的培训和新转让方法 (Unsupervised Translation of German--Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language) - 专知论文

会员服务 ·

0

BLEU · 无监督 · Better · Machine Translation · contrastive ·

2021 年 9 月 24 日

Unsupervised Translation of German--Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language

翻译：未经监督的德文-下索布语翻译:探索关于低资源语言的培训和新转让方法

Lukas Edman,Ahmet Üstün,Antonio Toral,Gertjan van Noord

This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2021 Unsupervised Machine Translation task for German--Lower Sorbian (DE--DSB): a high-resource language to a low-resource one. Our system uses a transformer encoder-decoder architecture in which we make three changes to the standard training procedure. First, our training focuses on two languages at a time, contrasting with a wealth of research on multilingual systems. Second, we introduce a novel method for initializing the vocabulary of an unseen language, achieving improvements of 3.2 BLEU for DE$\rightarrow$DSB and 4.0 BLEU for DSB$\rightarrow$DE. Lastly, we experiment with the order in which offline and online back-translation are used to train an unsupervised system, finding that using online back-translation first works better for DE$\rightarrow$DSB by 2.76 BLEU. Our submissions ranked first (tied with another team) for DSB$\rightarrow$DE and third for DE$\rightarrow$DSB.

翻译：本文介绍了格罗宁根大学为2021年德国-Lower Sorbian(DE-DSB)的无监督机器翻译任务(DE-DSB)提交的系统背后的方法:一种高资源语言,是一种低资源语言。我们的系统使用一个变压器编码解码器结构,对标准培训程序进行三次修改。首先,我们的培训侧重于两种语言,这与对多种语言系统的大量研究形成对比。第二,我们引入了一种新颖的方法,用于启动一种看不见语言的词汇,改进了3.2 BLEU, 用于德国-Lightrorror$DSB, 和4.0 BLEU用于DSB$\rightrowrorrown$\rightrown$DSB。最后,我们试验了使用离线和在线回译法来培训一个不受监督的系统所使用的顺序,发现使用在线背译首先对DE$rightrowrowr$DSB产生更好的效果。我们提交的资料(与另一个团队合作)用于DSBDSB的排名第一(DSB)。

0

相关内容

BLEU

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

XLM-RoBERTa: 一种多语言预训练模型

XLM-RoBERTa: 一种多语言预训练模型

深度学习自然语言处理

9+阅读 · 2020年7月26日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

2018机器学习开源资源盘点

2018机器学习开源资源盘点

专知

6+阅读 · 2019年2月2日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Joint Unsupervised and Supervised Training for Multilingual ASR

Arxiv

0+阅读 · 2021年11月15日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Towards Making the Most of BERT in Neural Machine Translation

Arxiv

5+阅读 · 2020年3月26日

One-Shot Unsupervised Cross Domain Translation

Arxiv

5+阅读 · 2018年10月23日

Unsupervised Multilingual Word Embeddings

Arxiv

4+阅读 · 2018年9月6日

Phrase-Based & Neural Unsupervised Machine Translation

Phrase-Based & Neural Unsupervised Machine Translation

Arxiv

9+阅读 · 2018年8月13日

Multi-Source Neural Machine Translation with Missing Data

Arxiv

5+阅读 · 2018年6月7日

Unsupervised Machine Translation Using Monolingual Corpora Only

Arxiv

5+阅读 · 2018年4月13日

Unsupervised Neural Machine Translation

Arxiv

6+阅读 · 2018年2月26日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

VIP会员

文章信息

相关主题

Machine Translation

相关VIP内容

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

XLM-RoBERTa: 一种多语言预训练模型

XLM-RoBERTa: 一种多语言预训练模型

深度学习自然语言处理

9+阅读 · 2020年7月26日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

2018机器学习开源资源盘点

2018机器学习开源资源盘点

专知

6+阅读 · 2019年2月2日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Joint Unsupervised and Supervised Training for Multilingual ASR

Arxiv

0+阅读 · 2021年11月15日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Towards Making the Most of BERT in Neural Machine Translation

Arxiv

5+阅读 · 2020年3月26日

One-Shot Unsupervised Cross Domain Translation

Arxiv

5+阅读 · 2018年10月23日

Unsupervised Multilingual Word Embeddings

Arxiv

4+阅读 · 2018年9月6日

Phrase-Based & Neural Unsupervised Machine Translation

Phrase-Based & Neural Unsupervised Machine Translation

Arxiv

9+阅读 · 2018年8月13日

Multi-Source Neural Machine Translation with Missing Data

Arxiv

5+阅读 · 2018年6月7日

Unsupervised Machine Translation Using Monolingual Corpora Only

Arxiv

5+阅读 · 2018年4月13日

Unsupervised Neural Machine Translation

Arxiv

6+阅读 · 2018年2月26日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

微信扫码咨询专知VIP会员