We show that citation metrics of journal articles in many of the online-only Springer Nature journals and associated ones are distorted, going back to articles from 2001. We find that most likely due to an API response error, there are many incorrect references which typically lead to Article Number 1 of a given Volume. Among others, the issue affects journals such as Scientific Reports, Nature Communications, Communications journals, Cell Death & Disease, Light: Science & Applications, as well as many BMC, Discovery and npj journals. Beyond the negative effect of introducing incorrect reference information, this distorts the citation statistics of articles in these journals, with a few articles being massively over-cited compared to their peers, while many lose citations; e.g. both in Scientific Reports and in Nature Communications, 5 of the 10 top cited articles have article numbers of 1. We validate the distorted statistics by assessing data from multiple scientific literature databases: Crossref, OpenCitations, Semantic Scholar, and the journals' websites. The issue primarily arises from the inconsistent transition from page-based referencing of articles to article number-based referencing, as well as the improper handling of the change in the publisher's article metadata API. It seems that the most pressing problem has been present since approximately 2011, which we estimate affects the citation count of millions of authors.
翻译:我们发现,许多Springer Nature纯在线期刊及其关联期刊中文章的引文指标存在扭曲,这一问题可追溯至2001年的文章。研究表明,很可能是由于API响应错误导致大量错误引用,这些引用通常指向特定卷期的第1篇文章。受此问题影响的期刊包括《Scientific Reports》《Nature Communications》系列通讯期刊、《Cell Death & Disease》《Light: Science & Applications》以及众多BMC、Discovery和npj系列期刊。除了引入错误参考文献信息的负面影响外,这还扭曲了这些期刊文章的引文统计——少数文章被过度引用,而许多文章则损失了引用次数;例如在《Scientific Reports》和《Nature Communications》中,被引次数前十的文章里各有5篇是第1号文章。我们通过评估多个科学文献数据库(Crossref、OpenCitations、Semantic Scholar及期刊官网)的数据验证了统计扭曲现象。该问题主要源于从基于页码的文章引用方式向基于文章编号的引用方式转换过程中的不一致性,以及出版商文章元数据API变更处理不当。最严重的问题大约自2011年起持续存在,据估计已影响数百万作者的引用计数。