To migrate the remarkable successes of Large Language Models (LLMs), the community has made numerous efforts to generalize them to the table reasoning tasks for the widely deployed tabular data. Despite that, in this work, by showing a probing experiment on our proposed StructQA benchmark, we postulate that even the most advanced LLMs (such as GPTs) may still fall short of coping with tabular data. More specifically, the current scheme often simply relies on serializing the tabular data, together with the meta information, then inputting them through the LLMs. We argue that the loss of structural information is the root of this shortcoming. In this work, we further propose TAMO, which bears an ideology to treat the tables as an independent modality integrated with the text tokens. The resulting model in TAMO is a multimodal framework consisting of a hypergraph neural network as the global table encoder seamlessly integrated with the mainstream LLM. Empirical results on various benchmarking datasets, including HiTab, WikiTQ, WikiSQL, FeTaQA, and StructQA, have demonstrated significant improvements on generalization with an average relative gain of 42.65%.
翻译:为了迁移大语言模型(LLMs)取得的显著成功,学术界已付出诸多努力,将其泛化至广泛部署的表格数据推理任务中。尽管如此,在本研究中,通过在我们提出的StructQA基准测试上进行探测实验,我们假设即使最先进的大语言模型(如GPT系列)在处理表格数据时仍可能力有未逮。具体而言,当前方案通常仅依赖将表格数据与元信息序列化后输入大语言模型。我们认为,结构信息的缺失是这一不足的根本原因。本研究进一步提出TAMO框架,其核心思想是将表格视为与文本标记集成的独立模态。TAMO中的模型是一个多模态框架,包含作为全局表格编码器的超图神经网络,与主流大语言模型无缝集成。在多个基准数据集(包括HiTab、WikiTQ、WikiSQL、FeTaQA和StructQA)上的实证结果表明,该框架在泛化能力上取得显著提升,平均相对增益达42.65%。