We show that citation metrics of journal articles in many of the online-only Springer Nature journals and associated ones, such as Scientific Reports, Nature Communications, Communications journals, as well as many BMC, Discovery and npj journals, are distorted, going back to articles from 2001. We find that most likely due to an API response error, many references lead to the wrong article, typically to Article Number 1 of a given Volume. Beyond the negative effect of introducing incorrect reference information, this distorts the citation statistics of articles in these journals, with a few articles being massively over-cited compared to their peers, while many lose citations; e.g. both in Scientific Reports and in Nature Communications, 5 of the 10 top cited articles are article number 1s. We validate the distorted statistics by assessing data from multiple scientific literature databases: Crossref, OpenCitations, Semantic Scholar, and the journals' websites. The issue primarily arises from the inconsistent transition from page-based referencing of articles to article number-based referencing, as well as the improper handling of the change in the publisher's article metadata API. It seems that the most pressing problem has been present since approximately 2011, which we estimate affects the citation count of millions of authors.
翻译:我们发现,Springer Nature旗下许多纯在线期刊及相关期刊(如《Scientific Reports》《Nature Communications》、Communications系列期刊,以及众多BMC、Discovery和npj期刊)中,自2001年以来的文章引用指标存在失真现象。研究显示,很可能是由于API响应错误,大量参考文献指向了错误的文章——通常指向特定卷期的第1篇文章。这不仅导致参考文献信息错误,还扭曲了这些期刊文章的引用统计:少数文章被过度引用,而大量文章则损失了应得的引用次数。例如,在《Scientific Reports》和《Nature Communications》中,被引次数前十的文章里各有五篇是卷期首篇文章。我们通过Crossref、OpenCitations、Semantic Scholar及期刊官网等多类科学文献数据库的数据验证了统计失真问题。该问题主要源于从基于页码的文章引用方式向基于文章编号引用方式的不一致转换,以及出版商文章元数据API变更处理不当。最严重的问题自2011年左右持续存在,估计已影响数百万作者的引用计数。