超越孤立点：将结构化表格构建作为深度知识提取的基准测试 (Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction)

With the emergence of large language models (LLMs), there is an expectation that LLMs can effectively extract explicit information from complex real-world documents (e.g., papers, reports). However, most LLMs generate paragraph-style answers that are chaotic, disorganized, and untraceable. To bridge this gap, we introduce the Arranged and Organized Extraction Benchmark (AOE), a new bilingual benchmark with data and documents of varying lengths designed to systematically evaluate the ability of LLMs to comprehend fragmented documents and reconstruct isolated information into one organized table. Unlike conventional text-to-table tasks, which rely on fixed schema and narrow task domains, AOE includes 11 carefully crafted tasks across three diverse domains, requiring models to generate context-specific schema tailored to varied input queries. In the experiment, we evaluated both open-source and closed-source state-of-the-art LLMs. The results show that even the most advanced models struggled significantly. The benchmark is available at https://anonymous.4open.science/r/AOE-Benchmark/.

翻译：随着大语言模型（LLMs）的出现，人们期望LLMs能够有效地从复杂的现实世界文档（如论文、报告）中提取显式信息。然而，大多数LLMs生成的是混乱、无组织且不可追溯的段落式答案。为弥补这一差距，我们引入了Arranged and Organized Extraction Benchmark（AOE），这是一个新的双语基准测试，包含不同长度的数据和文档，旨在系统评估LLMs理解碎片化文档并将孤立信息重构为一张组织化表格的能力。与依赖固定模式和狭窄任务领域的传统文本到表格任务不同，AOE涵盖了三个多样化领域的11项精心设计的任务，要求模型根据不同的输入查询生成针对特定上下文的模式。在实验中，我们评估了开源和闭源的最先进LLMs。结果表明，即使是最先进的模型也面临显著困难。该基准测试可在https://anonymous.4open.science/r/AOE-Benchmark/获取。

相关内容

MoDELS

关注 0

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日