Falls are a leading cause of injury and loss of independence among older adults. Vision-based fall prediction systems offer a non-invasive solution to anticipate falls seconds before impact, but their development is hindered by the scarcity of available fall data. Contributing to these efforts, this study proposes the Biomechanical Spatio-Temporal Graph Convolutional Network (BioST-GCN), a dual-stream model that combines both pose and biomechanical information using a cross-attention fusion mechanism. Our model outperforms the vanilla ST-GCN baseline by 5.32% and 2.91% F1-score on the simulated MCF-UA stunt-actor and MUVIM datasets, respectively. The spatio-temporal attention mechanisms in the ST-GCN stream also provide interpretability by identifying critical joints and temporal phases. However, a critical simulation-reality gap persists. While our model achieves an 89.0% F1-score with full supervision on simulated data, zero-shot generalization to unseen subjects drops to 35.9%. This performance decline is likely due to biases in simulated data, such as `intent-to-fall' cues. For older adults, particularly those with diabetes or frailty, this gap is exacerbated by their unique kinematic profiles. To address this, we propose personalization strategies and advocate for privacy-preserving data pipelines to enable real-world validation. Our findings underscore the urgent need to bridge the gap between simulated and real-world data to develop effective fall prediction systems for vulnerable elderly populations.
翻译:跌倒是老年人受伤和丧失独立生活能力的主要原因之一。基于视觉的跌倒预测系统提供了一种非侵入性解决方案,可在跌倒发生前数秒进行预警,但其发展受到可用跌倒数据稀缺的限制。为此,本研究提出生物力学时空图卷积网络(BioST-GCN),这是一种双流模型,通过交叉注意力融合机制结合姿态与生物力学信息。在模拟的MCF-UA特技演员数据集和MUVIM数据集上,本模型的F1分数分别比基准ST-GCN模型提高了5.32%和2.91%。ST-GCN流中的时空注意力机制通过识别关键关节和时序阶段,提供了模型的可解释性。然而,模拟与现实之间的显著差距依然存在。尽管本模型在模拟数据上通过全监督训练达到了89.0%的F1分数,但对未见受试者的零样本泛化性能下降至35.9%。这种性能下降可能源于模拟数据中的偏差,例如‘意图跌倒’线索。对于老年人,尤其是患有糖尿病或身体虚弱的个体,其独特的运动学特征会进一步加剧这一差距。为解决此问题,我们提出了个性化策略,并倡导建立隐私保护的数据管道以支持真实世界验证。本研究结果强调了弥合模拟数据与真实世界数据之间差距的紧迫性,以期为脆弱老年群体开发有效的跌倒预测系统。