Sensory substitution enables biological systems to perceive stimuli typically obtained by another organ, which is inspirational for physical agents. Multi-modal perception of intrinsic and extrinsic interactions is critical in building an intelligent robot that learns. This study presents a Vision-based See-Through Perception (VBSeeThruP) architecture that simultaneously perceives multiple intrinsic and extrinsic modalities via a single visual input in a markerless way, all packed within a soft robotic finger using the Soft Polyhedral Network design. It is generally applicable to miniature vision systems placed underneath deformable networks with a see-through design, capturing real-time images of the network's physical interactions induced by contact-based events overlayed on top of the visual scene of the external environment, as demonstrated in the ablation study. We present the VBSeeThruP's capability for learning reactive grasping without using external cameras or dedicated force and torque sensors on the fingertips. Using the inpainted scene and the deformation mask, we further demonstrate the multi-modal performance of the VBSeeThruP architecture to simultaneously achieve various perceptions, including but not limited to scene inpainting, object detection, depth sensing, scene segmentation, masked deformation tracking, 6D force/torque sensing, and contact event detection, all within a single sensory input from the in-finger vision markerlessly.
翻译:暂无翻译