Echocardiogram datasets enable training deep learning models to automate interpretation of cardiac ultrasound, thereby expanding access to accurate readings of diagnostically-useful images. However, the gender, sex, race, and ethnicity of the patients in these datasets are underreported and subgroup-specific predictive performance is unevaluated. These reporting deficiencies raise concerns about subgroup validity that must be studied and addressed before model deployment. In this paper, we show that current open echocardiogram datasets are unable to assuage subgroup validity concerns. We improve sociodemographic reporting for two datasets: TMED-2 and MIMIC-IV-ECHO. Analysis of six open datasets reveals no consideration of gender-diverse patients and insufficient patient counts for many racial and ethnic groups. We further perform an exploratory subgroup analysis of two published aortic stenosis detection models on TMED-2. We find insufficient evidence for subgroup validity for sex, racial, and ethnic subgroups. Our findings highlight that more data for underrepresented subgroups, improved demographic reporting, and subgroup-focused analyses are needed to prove subgroup validity in future work.
翻译:超声心动图数据集能够训练深度学习模型以自动化解读心脏超声图像,从而扩大对具有诊断价值的图像进行准确判读的可及性。然而,这些数据集中患者的性别、种族和民族信息报告不足,且未评估针对特定亚组的预测性能。这些报告缺陷引发了关于亚组有效性的担忧,必须在模型部署前加以研究和解决。本文表明,当前开放的超声心动图数据集无法缓解亚组有效性的担忧。我们改进了两个数据集(TMED-2和MIMIC-IV-ECHO)的社会人口统计学报告。对六个开放数据集的分析显示,未考虑性别多样化患者,且许多种族和民族群体的患者数量不足。我们进一步对TMED-2上两个已发布的主动脉瓣狭窄检测模型进行了探索性亚组分析。我们发现,对于性别、种族和民族亚组,缺乏足够的证据支持其亚组有效性。我们的研究结果强调,未来工作需要更多来自代表性不足亚组的数据、改进的人口统计学报告以及聚焦亚组的分析,以证明亚组有效性。