Left ventricular ejection fraction (LVEF) is a key indicator of cardiac function and is routinely used to diagnose heart failure and guide treatment decisions. Although deep learning has advanced automated LVEF estimation, many existing approaches are computationally demanding and underutilize the joint structure of spatial and temporal information in echocardiography videos, limiting their suitability for real-time clinical deployment. We propose Echo-E$^3$Net, an efficient endocardial spatio-temporal network specifically designed for LVEF estimation from echocardiography videos. Echo-E$^3$Net comprises two complementary modules: (1) a dual-phase Endocardial Border Detector (E$^2$CBD), which uses phase-specific cross-attention to predict ED/ES endocardial landmarks (EBs) and learn phase-aware landmark embeddings (LEs), and (2) an Endocardial Feature Aggregator (E$^2$FA), which fuses these embeddings with global statistical descriptors (mean, maximum, variance) of deep feature maps to refine EF regression. A multi-component loss function, inspired by Simpson's biplane method, jointly supervises EF, volumes, and landmark geometry, thereby aligning optimization with the clinical definition of LVEF and promoting robust spatio-temporal representation learning. Evaluated on the EchoNet-Dynamic dataset, Echo-E$^3$Net achieves an RMSE of 5.20 and an $R^2$ score of 0.82, while using only 1.54M parameters and 8.05 GFLOPs. The model operates without external pre-training, heavy data augmentation, or test-time ensembling, making it highly suitable for real-time point-of-care ultrasound (POCUS) applications. Code is available at https://github.com/UltrAi-lab/Echo-E3Net.
翻译:左心室射血分数(LVEF)是评估心脏功能的关键指标,常规用于诊断心力衰竭并指导治疗决策。尽管深度学习已推动了自动化LVEF估计的进展,但许多现有方法计算需求高,且未能充分利用超声心动图视频中时空信息的联合结构,限制了其在实时临床部署中的适用性。我们提出了Echo-E$^3$Net,一种专为从超声心动图视频估计LVEF而设计的高效心内膜时空网络。Echo-E$^3$Net包含两个互补模块:(1)双相位心内膜边界检测器(E$^2$CBD),它利用相位特异性交叉注意力来预测舒张末期/收缩末期心内膜标志点(EBs),并学习相位感知的标志点嵌入(LEs);(2)心内膜特征聚合器(E$^2$FA),它将上述嵌入与深度特征图的全局统计描述符(均值、最大值、方差)融合,以优化EF回归。受辛普森双平面法启发,一个多组分损失函数联合监督EF、容积和标志点几何结构,从而使优化与LVEF的临床定义对齐,并促进鲁棒的时空表示学习。在EchoNet-Dynamic数据集上的评估显示,Echo-E$^3$Net实现了5.20的均方根误差和0.82的$R^2$分数,同时仅使用1.54M参数和8.05 GFLOPs。该模型无需外部预训练、繁重的数据增强或测试时集成即可运行,使其非常适合实时床旁超声(POCUS)应用。代码可在https://github.com/UltrAi-lab/Echo-E3Net获取。