Automating disassembly of critical components from end-of-life (EoL) desktops, such as high-value items like RAM modules and CPUs, as well as sensitive parts like hard disk drives, remains challenging due to the inherent variability and uncertainty of these products. Moreover, their disassembly requires sequential, precise, and dexterous operations, further increasing the complexity of automation. Current robotic disassembly processes are typically divided into several stages: perception, sequence planning, task planning, motion planning, and manipulation. Each stage requires explicit modeling, which limits generalization to unfamiliar scenarios. Recent development of vision-language-action (VLA) models has presented an end-to-end approach for general robotic manipulation tasks. Although VLAs have demonstrated promising performance on simple tasks, the feasibility of applying such models to complex disassembly remains largely unexplored. In this paper, we collected a customized dataset for robotic RAM and CPU disassembly and used it to fine-tune two well-established VLA approaches, OpenVLA and OpenVLA-OFT, as a case study. We divided the whole disassembly task into several small steps, and our preliminary experimental results indicate that the fine-tuned VLA models can faithfully complete multiple early steps but struggle with certain critical subtasks, leading to task failure. However, we observed that a simple hybrid strategy that combines VLA with a rule-based controller can successfully perform the entire disassembly operation. These findings highlight the current limitations of VLA models in handling the dexterity and precision required for robotic EoL product disassembly. By offering a detailed analysis of the observed results, this study provides insights that may inform future research to address current challenges and advance end-to-end robotic automated disassembly.
翻译:从报废台式机中自动拆解关键组件(如高价值的RAM模块和CPU,以及硬盘驱动器等敏感部件)仍然具有挑战性,这主要源于这些产品固有的多变性和不确定性。此外,其拆解过程需要连续、精确且灵巧的操作,进一步增加了自动化的复杂性。当前的机器人拆解流程通常分为几个阶段:感知、序列规划、任务规划、运动规划和操作执行。每个阶段都需要显式建模,这限制了其在陌生场景中的泛化能力。近期视觉-语言-动作模型的发展为通用机器人操作任务提供了一种端到端的方法。尽管VLA模型在简单任务上已展现出良好性能,但将此类模型应用于复杂拆解任务的可行性仍很大程度上未被探索。本文中,我们收集了一个用于机器人RAM和CPU拆解的自定义数据集,并以此作为案例研究,对两种成熟的VLA方法——OpenVLA和OpenVLA-OFT——进行了微调。我们将整个拆解任务分解为若干小步骤,初步实验结果表明,微调后的VLA模型能够可靠地完成多个早期步骤,但在某些关键子任务上存在困难,导致任务失败。然而,我们观察到一种简单的混合策略,即将VLA与基于规则的控制器相结合,能够成功执行整个拆解操作。这些发现凸显了当前VLA模型在处理机器人报废产品拆解所需灵巧性和精确性方面的局限性。通过对观察结果进行详细分析,本研究提供了见解,可能为未来研究解决当前挑战、推动端到端机器人自动化拆解提供参考。