This paper presents a learning-based approach for accurately estimating the 3D shape of flexible continuum robots subjected to external loads. The proposed method introduces a spatiotemporal neural network architecture that fuses multi-modal inputs, including current and historical tendon displacement data and RGB images, to generate point clouds representing the robot's deformed configuration. The network integrates a recurrent neural module for temporal feature extraction, an encoding module for spatial feature extraction, and a multi-modal fusion module to combine spatial features extracted from visual data with temporal dependencies from historical actuator inputs. Continuous 3D shape reconstruction is achieved by fitting B\'ezier curves to the predicted point clouds. Experimental validation demonstrates that our approach achieves high precision, with mean shape estimation errors of 0.08 mm (unloaded) and 0.22 mm (loaded), outperforming state-of-the-art methods in shape sensing for TDCRs. The results validate the efficacy of deep learning-based spatiotemporal data fusion for precise shape estimation under loading conditions.
翻译:暂无翻译