Reveal-Bangla：一个用于跨语言多步推理评估的数据集 (Reveal-Bangla: A Dataset for Cross-Lingual Multi-Step Reasoning Evaluation)

Language models have demonstrated remarkable performance on complex multi-step reasoning tasks. However, their evaluation has been predominantly confined to high-resource languages such as English. In this paper, we introduce a manually translated Bangla multi-step reasoning dataset derived from the English Reveal dataset, featuring both binary and non-binary question types. We conduct a controlled evaluation of English-centric and Bangla-centric multilingual small language models on the original dataset and our translated version to compare their ability to exploit relevant reasoning steps to produce correct answers. Our results show that, in comparable settings, reasoning context is beneficial for more challenging non-binary questions, but models struggle to employ relevant Bangla reasoning steps effectively. We conclude by exploring how reasoning steps contribute to models' predictions, highlighting different trends across models and languages.

翻译：语言模型在复杂多步推理任务上已展现出卓越性能。然而，其评估主要局限于英语等高资源语言。本文介绍了一个基于英文Reveal数据集人工翻译的孟加拉语多步推理数据集，包含二元与非二元问题类型。我们在原始数据集及翻译版本上对以英语为中心和以孟加拉语为中心的多语言小规模语言模型进行了受控评估，以比较它们利用相关推理步骤生成正确答案的能力。结果表明，在可比较的设置下，推理上下文对更具挑战性的非二元问题有益，但模型难以有效运用相关的孟加拉语推理步骤。最后，我们通过探究推理步骤如何影响模型预测，揭示了不同模型与语言间的差异趋势。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

Query2box: 使用盒嵌入对向量空间中的知识图谱进行推理，Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings

专知会员服务

46+阅读 · 2020年5月11日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日