BEV-radar：毫米波雷达-相机双向融合的三维目标检测

赵园; 张露; 邓家俊; 张燕咏

doi:10.52396/JUSTC-2023-0006

BEV-radar：毫米波雷达-相机双向融合的三维目标检测

BEV-radar: bidirectional radar-camera fusion for 3D object detection

摘要

摘要: 在自动驾驶场景下的3D目标检测任务中，探索毫米波雷达数据作为RGB图像输入的补充正成为多模态融合的新兴趋势。然而，现有的毫米波雷达-相机融合方法高度依赖于相机的一阶段检测结果，导致整体性能不够理想。本文提供了一种不依赖于相机检测结果的鸟瞰图下双向融合方法（BEV-radar）。对于来自不同域的两个模态的特征，BEV-radar设计了一个双向的基于注意力的融合策略。具体地，以基于BEV的3D目标检测方法为基础，我们的方法使用双向转换器嵌入来自两种模态的信息，并根据后续的卷积块强制执行局部空间关系。嵌入特征后，BEV特征在3D对象预测头中解码。我们在nuScenes数据集上评估了我们的方法，实现了48.2 mAP和57.6 NDS。结果显示，与仅使用相机的基础模型相比，不仅在精度上有所提升，特别地，速度预测误差项有了相当大的改进。代码开源于https://github.com/Etah0409/BEV-Radar。

Abstract: Exploring millimeter wave radar data as complementary to RGB images for ameliorating 3D object detection has become an emerging trend for autonomous driving systems. However, existing radar-camera fusion methods are highly dependent on the prior camera detection results, rendering the overall performance unsatisfactory. In this paper, we propose a bidirectional fusion scheme in the bird-eye view (BEV-radar), which is independent of prior camera detection results. Leveraging features from both modalities, our method designs a bidirectional attention-based fusion strategy. Specifically, following BEV-based 3D detection methods, our method engages a bidirectional transformer to embed information from both modalities and enforces the local spatial relationship according to subsequent convolution blocks. After embedding the features, the BEV features are decoded in the 3D object prediction head. We evaluate our method on the nuScenes dataset, achieving 48.2 mAP and 57.6 NDS. The result shows considerable improvements compared to the camera-only baseline, especially in terms of velocity prediction. The code is available at https://github.com/Etah0409/BEV-Radar.

HTML全文

参考文献(34)

施引文献

资源附件(1)