Dense clutter removal for target object retrieval presents a challenging problem, especially when targets are embedded deep within densely-packed configurations. It requires foresight to minimize overall changes to the clutter configuration while accessing target objects, avoiding stack destabilization and reducing the number of object removals required. Rule-based planners when applied to this problem, rely on rigid heuristics, leading to high computational overhead. End-to-end reinforcement learning approaches struggle with interpretability and generalizability over different conditions. To address these issues, we present ClutterNav, a novel decision-making framework that can identify the next best object to be removed so as to access a target object in a given clutter, while minimising stack disturbances. ClutterNav formulates the problem as a continuous reinforcement learning task, where each object removal dynamically updates the understanding of the scene. A removability critic, trained from demonstrations, estimates the cost of removing any given object based on geometric and spatial features. This learned cost is complemented by integrated gradients that assess how the presence or removal of surrounding objects influences the accessibility of the target. By dynamically prioritizing actions that balance immediate removability against long-term target exposure, ClutterNav achieves near human-like strategic sequencing, without predefined heuristics. The proposed approach is validated extensively in simulation and over real-world experiments. The results demonstrate real-time, occlusion-aware decision-making in partially observable environments.
翻译:针对目标物体检索的密集杂乱场景清理是一个具有挑战性的问题,尤其在目标物体深嵌于紧密堆积的构型中时。该任务需具备前瞻性,在获取目标物体的同时最小化对杂乱构型的整体扰动,避免堆叠失稳并减少所需移除的物体数量。基于规则的规划器在处理此问题时依赖刚性启发式策略,导致较高的计算开销。端到端强化学习方法则在可解释性和不同条件下的泛化能力方面存在不足。为解决这些问题,我们提出了ClutterNav——一种新颖的决策框架,能够在给定杂乱场景中识别出为获取目标物体而需移除的下一个最优物体,同时最小化堆叠扰动。ClutterNav将问题建模为连续强化学习任务,其中每次物体移除都会动态更新对场景的理解。通过演示数据训练的可移除性评估器,能够基于几何与空间特征估计移除任意给定物体的代价。该学习代价与积分梯度相结合,后者用于评估周围物体的存在或移除如何影响目标的可达性。通过动态权衡即时可移除性与长期目标暴露性的行动优先级策略,ClutterNav实现了接近人类水平的策略序列规划,且无需预定义启发式规则。所提方法在仿真和真实世界实验中得到了广泛验证。结果表明,该方法能在部分可观测环境中实现实时、遮挡感知的决策。