A deep understanding of kinematic structures and movable components is essential for enabling robots to manipulate objects and model their own articulated forms. Such understanding is captured through articulated objects, which are essential for tasks such as physical simulation, motion planning, and policy learning. However, creating these models, particularly for objects with high degrees of freedom (DoF), remains a significant challenge. Existing methods typically rely on motion sequences or strong assumptions from hand-curated datasets, which hinders scalability. In this paper, we introduce Kinematify, an automated framework that synthesizes articulated objects directly from arbitrary RGB images or textual descriptions. Our method addresses two core challenges: (i) inferring kinematic topologies for high-DoF objects and (ii) estimating joint parameters from static geometry. To achieve this, we combine MCTS search for structural inference with geometry-driven optimization for joint reasoning, producing physically consistent and functionally valid descriptions. We evaluate Kinematify on diverse inputs from both synthetic and real-world environments, demonstrating improvements in registration and kinematic topology accuracy over prior work.
翻译:对运动学结构和可移动部件的深入理解,对于使机器人能够操作物体并对其自身铰接形态进行建模至关重要。这种理解通过铰接物体来体现,其在物理仿真、运动规划与策略学习等任务中不可或缺。然而,创建此类模型,特别是针对高自由度(DoF)物体,仍然是一项重大挑战。现有方法通常依赖于手工整理数据集中的运动序列或强假设,这限制了方法的可扩展性。本文提出Kinematify,一种自动化框架,能够直接从任意RGB图像或文本描述中合成铰接物体。我们的方法解决了两个核心挑战:(i)推断高自由度物体的运动学拓扑结构;(ii)从静态几何中估计关节参数。为此,我们将用于结构推断的MCTS搜索与用于关节推理的几何驱动优化相结合,生成物理一致且功能有效的描述。我们在合成和真实环境中的多样化输入上评估Kinematify,结果表明其在配准精度和运动学拓扑准确性方面优于先前工作。