通过编译消除竞态检测的开销 (Compiling Away the Overhead of Race Detection)

Dynamic data race detectors are indispensable for flagging concurrency errors in software, but their high runtime overhead limits their adoption. This overhead stems primarily from pervasive instrumentation of memory accesses - a significant fraction of which is redundant. We addresses this inefficiency through a static, compiler-integrated approach that identifies and eliminates redundant instrumentation, drastically reducing the runtime cost of dynamic data race detectors. We introduce a suite of interprocedural static analyses reasoning about memory access patterns, synchronization, and thread creation to eliminate instrumentation for provably race-free accesses and show that the completeness properties of the data race detector are preserved. We further observe that many inserted checks flag a race if and only if a preceding check has already flagged an equivalent race for the same memory location - albeit potentially at a different access. We characterize this notion of equivalence and show that, when limiting reporting to at least one representative for each equivalence class, a further class of redundant checks can be eliminated. We identify such accesses using a novel dominance-based elimination analysis. Based on these two insights, we have implemented five static analyses within the LLVM, integrated with the instrumentation pass of the race detector ThreadSanitizer. Our experimental evaluation on a diverse suite of real-world applications demonstrates that our approach significantly reduces race detection overhead, achieving a geomean speedup of 1.34x, with peak speedups reaching 2.5x under high thread contention. This performance is achieved with a negligible increase in compilation time and, being fully automatic, places no additional burden on developers. Our optimizations have been accepted by the ThreadSanitizer maintainers and are in the process of being upstreamed.

翻译：动态数据竞态检测器对于标记软件中的并发错误至关重要，但其高昂的运行时开销限制了其广泛应用。这种开销主要源于对内存访问的普遍插桩——其中很大一部分是冗余的。我们通过一种静态、编译器集成的方法来解决这一低效问题，该方法能够识别并消除冗余插桩，从而大幅降低动态数据竞态检测器的运行时成本。我们引入了一套跨过程的静态分析，通过推理内存访问模式、同步机制和线程创建，来消除可证明无竞态的访问的插桩，并证明数据竞态检测器的完备性得以保留。我们进一步观察到，许多插入的检查仅在且仅当前一个检查已为同一内存位置（尽管可能是在不同的访问中）标记了等效竞态时才会标记竞态。我们刻画了这种等价性的概念，并表明当将报告限制为每个等价类至少一个代表性实例时，可以消除另一类冗余检查。我们使用一种新颖的基于支配关系的消除分析来识别此类访问。基于这两点见解，我们在LLVM中实现了五种静态分析，并与竞态检测器ThreadSanitizer的插桩通道集成。我们在多样化真实世界应用程序套件上的实验评估表明，我们的方法显著降低了竞态检测开销，实现了1.34倍的几何平均加速比，在高线程争用下峰值加速比可达2.5倍。这一性能是在编译时间可忽略增加的情况下实现的，并且完全自动化，不会给开发者带来额外负担。我们的优化已被ThreadSanitizer维护者接受，并正在逐步整合到上游代码中。