Large Language Models (LLMs) have fundamentally reshaped Argument Mining (AM), shifting it from a pipeline of supervised, task-specific classifiers to a spectrum of prompt-driven, retrieval-augmented, and reasoning-oriented paradigms. Yet existing surveys largely predate this transition, leaving unclear how LLMs alter task formulations, dataset design, evaluation methodology, and the theoretical foundations of computational argumentation. In this survey, we synthesise research and provide the first unified account of AM in the LLM era. We revisit canonical AM subtasks, i.e., claim and evidence detection, relation prediction, stance classification, argument quality assessment, and argumentative summarisation, and show how prompting, chain-of-thought reasoning, and in-context learning blur traditional task boundaries. We catalogue the rapid evolution of resources, including integrated multi-layer corpora and LLM-assisted annotation pipelines that introduce new opportunities as well as risks of bias and evaluation circularity. Building on this mapping, we identify emerging architectural patterns across LLM-based AM systems and consolidate evaluation practices spanning component-level accuracy, soft-label quality assessment, and LLM-judge reliability. Finally, we outline persistent challenges, including long-context reasoning, multimodal and multilingual robustness, interpretability, and cost-efficient deployment, and propose a forward-looking research agenda for LLM-driven computational argumentation.
翻译:大语言模型(LLMs)从根本上重塑了论辩挖掘(AM)领域,将其从一系列监督式、任务特定的分类器流水线转变为提示驱动、检索增强和推理导向的范式谱系。然而,现有综述大多早于这一转变,未能阐明LLMs如何改变任务定义、数据集设计、评估方法以及计算论辩的理论基础。在本综述中,我们整合相关研究,首次对LLM时代的AM提供了统一阐述。我们回顾了经典AM子任务,即主张与证据检测、关系预测、立场分类、论辩质量评估和论辩性摘要,并展示了提示工程、思维链推理和上下文学习如何模糊传统任务边界。我们梳理了资源的快速演变,包括集成多层语料库和LLM辅助的标注流程,这些进展既带来了新机遇,也引入了偏见和评估循环的风险。基于此映射,我们识别了基于LLM的AM系统中新兴的架构模式,并整合了涵盖组件级准确性、软标签质量评估和LLM评判可靠性的评估实践。最后,我们概述了持续存在的挑战,包括长上下文推理、多模态与多语言鲁棒性、可解释性以及成本高效部署,并提出了面向LLM驱动的计算论辩的前瞻性研究议程。