Building systems with capability of natural language understanding (NLU) has been one of the oldest areas of AI. An essential component of NLU is to detect logical succession of events contained in a text. The task of sentence ordering is proposed to learn succession of events with applications in AI tasks. The performance of previous works employing statistical methods is poor, while the neural networks-based approaches are in serious need of large corpora for model learning. In this paper, we propose a method for sentence ordering which does not need a training phase and consequently a large corpus for learning. To this end, we generate sentence embedding using BERT pre-trained model and measure sentence similarity using cosine similarity score. We suggest this score as an indicator of sequential events' level of coherence. We finally sort the sentences through brute-force search to maximize overall similarities of the sequenced sentences. Our proposed method outperformed other baselines on ROCStories, a corpus of 5-sentence human-made stories. The method is specifically more efficient than neural network-based methods when no huge corpus is available. Among other advantages of this method are its interpretability and needlessness to linguistic knowledge.
翻译:建立具有自然语言理解能力的系统(NLU)一直是AI最古老的领域之一。 NLU的一个基本组成部分是检测文本中包含的事件的逻辑顺序。 命令的任务是学习与AI任务应用的相继事件。 先前采用统计方法的工作表现很差, 而以神经网络为基础的方法非常需要大型公司进行模型学习。 在本文中, 我们提出了一个不需要培训阶段的句号排序方法, 因而不需要大量的学习内容。 为此, 我们用BERT预先培训的模式生成句号, 并用共弦相似性评分衡量类似性。 我们建议将此评分作为顺序事件一致性程度的指标。 我们最后通过粗力搜索来排序句子, 以尽量扩大顺序句子的总体相似性。 我们提出的方法超过了由五种证据组成的人类故事集成的其他基线。 这种方法比没有巨量的神经网络方法特别有效。 这种方法的其他优点是其解释能力和语言知识的不必要性。