精细大型实体打字自动发布标签校正 (Automatic Noisy Label Correction for Fine-Grained Entity Typing)

Fine-grained entity typing (FET) aims to assign proper semantic types to entity mentions according to their context, which is a fundamental task in various entity-leveraging applications. Current FET systems usually establish on large-scale weakly-supervised/distantly annotation data, which may contain abundant noise and thus severely hinder the performance of the FET task. Although previous studies have made great success in automatically identifying the noisy labels in FET, they usually rely on some auxiliary resources which may be unavailable in real-world applications (e.g. pre-defined hierarchical type structures, human-annotated subsets). In this paper, we propose a novel approach to automatically correct noisy labels for FET without external resources. Specifically, it first identifies the potentially noisy labels by estimating the posterior probability of a label being positive or negative according to the logits output by the model, and then relabel candidate noisy labels by training a robust model over the remaining clean labels. Experiments on two popular benchmarks prove the effectiveness of our method. Our source code can be obtained from https://github.com/CCIIPLab/DenoiseFET.

翻译：精密实体打字(FET) 旨在根据实体的背景,为所提及实体指定适当的语义类型,这是各种实体杠杆应用中的一项基本任务。当前的FET系统通常在大规模微弱监督/远处注解数据上建立,其中可能含有大量噪音,从而严重妨碍FET任务的执行。虽然先前的研究在自动识别FET中的噪音标签方面取得了很大成功,但它们通常依赖在现实世界应用程序中可能无法使用的一些辅助资源(例如,预先界定的等级类型结构,人注解子集)。在本文件中,我们提出了一个新颖的方法,在没有外部资源的情况下自动纠正FET的噪音标签。具体地说,它首先通过估计一个标签根据模型的逻辑输出呈正或负的远端概率来识别潜在的噪音标签,然后通过对其余的清洁标签进行坚固模型的再贴标签。两个流行基准的实验证明了我们的方法的有效性。我们的源代码可以从https://github.com/CCIPLAB/DENS。