利用基于愿景的深层学习对机器人外科手术进行量化 (Quantification of Robotic Surgeries with Vision-Based Deep Learning)

Surgery is a high-stakes domain where surgeons must navigate critical anatomical structures and actively avoid potential complications while achieving the main task at hand. Such surgical activity has been shown to affect long-term patient outcomes. To better understand this relationship, whose mechanics remain unknown for the majority of surgical procedures, we hypothesize that the core elements of surgery must first be quantified in a reliable, objective, and scalable manner. We believe this is a prerequisite for the provision of surgical feedback and modulation of surgeon performance in pursuit of improved patient outcomes. To holistically quantify surgeries, we propose a unified deep learning framework, entitled Roboformer, which operates exclusively on videos recorded during surgery to independently achieve multiple tasks: surgical phase recognition (the what of surgery), gesture classification and skills assessment (the how of surgery). We validated our framework on four video-based datasets of two commonly-encountered types of steps (dissection and suturing) within minimally-invasive robotic surgeries. We demonstrated that our framework can generalize well to unseen videos, surgeons, medical centres, and surgical procedures. We also found that our framework, which naturally lends itself to explainable findings, identified relevant information when achieving a particular task. These findings are likely to instill surgeons with more confidence in our framework's behaviour, increasing the likelihood of clinical adoption, and thus paving the way for more targeted surgical feedback.

翻译：外科手术是一个高取量的领域,外科医生必须在这个领域驾驶关键的解剖结构,并积极避免潜在并发症,同时完成目前的主要任务。这种外科手术活动已证明会影响长期的病人结果。为了更好地了解这种关系,其机理对大多数外科手术程序来说仍然不为人知,我们假设外科手术的核心要素必须首先以可靠、客观和可缩放的方式量化。我们认为这是提供外科手术反馈和调整外科手术性能以追求更好的患者结果的先决条件。为了整体量化外科手术,我们提议了一个统一的深层次学习框架,题为“机器人”,它专门以手术期间记录的录像为主,独立完成多项任务:外科阶段识别(手术是什么)、手势分类和技能评估(外科手术如何)。我们用四个基于视频的数据集验证了我们的框架,即两种常见类型的步骤(分解和调控)在最小侵入性机器人手术手术结果中进行量化。我们展示了我们的框架可以概括地概括到看不见的录像、外科手术、医疗中心和外科手术程序。我们还发现一个更有针对性的框架,从而自然地解释了我们的外科外科手术发现。我们更可能实现特定结果。