Accurate cell counting is essential in various biomedical research and clinical applications, including cancer diagnosis, stem cell research, and immunology. Manual counting is labor-intensive and error-prone, motivating automation through deep learning techniques. However, training reliable deep learning models requires large amounts of high-quality annotated data, which is difficult and time-consuming to produce manually. Consequently, existing cell-counting datasets are often limited, frequently containing fewer than $500$ images. In this work, we introduce a large-scale annotated dataset comprising $3{,}023$ images from immunocytochemistry experiments related to cellular differentiation, containing over $430{,}000$ manually annotated cell locations. The dataset presents significant challenges: high cell density, overlapping and morphologically diverse cells, a long-tailed distribution of cell count per image, and variation in staining protocols. We benchmark three categories of existing methods: regression-based, crowd-counting, and cell-counting techniques on a test set with cell counts ranging from $10$ to $2{,}126$ cells per image. We also evaluate how the Segment Anything Model (SAM) can be adapted for microscopy cell counting using only dot-annotated datasets. As a case study, we implement a density-map-based adaptation of SAM (SAM-Counter) and report a mean absolute error (MAE) of $22.12$, which outperforms existing approaches (second-best MAE of $27.46$). Our results underscore the value of the dataset and the benchmarking framework for driving progress in automated cell counting and provide a robust foundation for future research and development.
翻译:精确的细胞计数在癌症诊断、干细胞研究和免疫学等众多生物医学研究与临床应用中至关重要。人工计数不仅劳动强度大且易出错,这推动了基于深度学习技术的自动化方法的发展。然而,训练可靠的深度学习模型需要大量高质量标注数据,而人工制作这些数据既困难又耗时。因此,现有的细胞计数数据集通常规模有限,经常包含少于 $500$ 张图像。本研究引入了一个大规模标注数据集,包含来自细胞分化相关免疫细胞化学实验的 $3{,}023$ 张图像,其中含有超过 $430{,}000$ 个人工标注的细胞位置。该数据集呈现了显著挑战:高细胞密度、细胞重叠且形态多样、每张图像的细胞数量呈长尾分布,以及染色方案的变化。我们在一个测试集上对三类现有方法进行了基准测试:基于回归的方法、人群计数技术和细胞计数技术,测试集中每张图像的细胞数量范围从 $10$ 到 $2{,}126$ 个。我们还评估了如何利用仅有点标注的数据集,将 Segment Anything Model (SAM) 适配用于显微镜细胞计数。作为案例研究,我们实现了一种基于密度图的 SAM 适配方法(SAM-Counter),并报告了平均绝对误差 (MAE) 为 $22.12$,优于现有方法(次优 MAE 为 $27.46$)。我们的结果凸显了该数据集和基准框架在推动自动化细胞计数进展方面的价值,并为未来的研究与发展提供了坚实基础。