Identifying/retrieving relevant statutes and prior cases/precedents for a given legal situation are common tasks exercised by law practitioners. Researchers to date have addressed the two tasks independently, thus developing completely different datasets and models for each task; however, both retrieval tasks are inherently related, e.g., similar cases tend to cite similar statutes (due to similar factual situation). In this paper, we address this gap. We propose IL-PCR (Indian Legal corpus for Prior Case and Statute Retrieval), which is a unique corpus that provides a common testbed for developing models for both the tasks (Statute Retrieval and Precedent Retrieval) that can exploit the dependence between the two. We experiment extensively with several baseline models on the tasks, including lexical models, semantic models and ensemble based on GNNs. Further, to exploit the dependence between the two tasks, we develop an LLM-based re-ranking approach that gives the best performance.
翻译:针对特定法律情境识别/检索相关法规及先例/判例是法律从业者常见的任务。迄今为止,研究者们将这两项任务独立处理,从而为每项任务开发了完全不同的数据集和模型;然而,这两项检索任务本质上是相互关联的,例如相似案件往往引用相似的法规(基于相似的事实情境)。本文旨在弥合这一研究空白。我们提出了IL-PCR(印度法律先例与法规检索语料库),这是一个独特的语料库,为开发能够利用两项任务间依赖关系的模型(法规检索与判例检索)提供了统一的测试平台。我们在两项任务上对多种基线模型进行了广泛实验,包括词法模型、语义模型以及基于图神经网络的集成模型。此外,为利用两项任务间的依赖关系,我们开发了一种基于大语言模型的重排序方法,该方法取得了最佳性能。