Multi-sensor fusion using LiDAR and RGB cameras significantly enhances 3D object detection task. However, conventional LiDAR sensors perform dense, stateless scans, ignoring the strong temporal continuity in real-world scenes. This leads to substantial sensing redundancy and excessive power consumption, limiting their practicality on resource-constrained platforms. To address this inefficiency, we propose a predictive, history-aware adaptive scanning framework that anticipates informative regions of interest (ROI) based on past observations. Our approach introduces a lightweight predictor network that distills historical spatial and temporal contexts into refined query embeddings. These embeddings guide a differentiable Mask Generator network, which leverages Gumbel-Softmax sampling to produce binary masks identifying critical ROIs for the upcoming frame. Our method significantly reduces unnecessary data acquisition by concentrating dense LiDAR scanning only within these ROIs and sparsely sampling elsewhere. Experiments on nuScenes and Lyft benchmarks demonstrate that our adaptive scanning strategy reduces LiDAR energy consumption by over 65% while maintaining competitive or even superior 3D object detection performance compared to traditional LiDAR-camera fusion methods with dense LiDAR scanning.
翻译:利用激光雷达与RGB相机的多传感器融合显著提升了三维目标检测任务的性能。然而,传统激光雷达传感器执行密集、无状态的扫描,忽略了真实场景中强烈的时序连续性。这导致显著的感知冗余与过高的功耗,限制了其在资源受限平台上的实用性。为解决这一低效问题,我们提出一种基于历史感知的自适应预测扫描框架,该框架依据过往观测预测信息丰富的感兴趣区域。我们的方法引入了一个轻量级预测网络,将历史空间与时序上下文信息提炼为精炼的查询嵌入。这些嵌入指导一个可微分的掩码生成网络,该网络利用Gumbel-Softmax采样生成二进制掩码,以识别下一帧中的关键感兴趣区域。通过仅在这些感兴趣区域内进行密集激光雷达扫描,并在其他区域稀疏采样,我们的方法显著减少了不必要的数据采集。在nuScenes和Lyft基准测试上的实验表明,与采用密集激光雷达扫描的传统激光雷达-相机融合方法相比,我们的自适应扫描策略在保持竞争力甚至更优的三维目标检测性能的同时,将激光雷达能耗降低了超过65%。