Yuan Zhao is currently an Algorithm Engineer in Horizon Robotics. She received her bachelor’s degree from Sichuan University in 2020 and master’s degree from the University of Science and Technology fo China in 2023. Her research interest mainly focuses on multi-modality sensor fusion, computer vision, and autonomous driving
Yanyong Zhang is currently a Professor at the School of Computer Science and Technology in the University of Science and Technology of China (USTC). She received her B.S. degree from USTC in 1997 and Ph.D. degree from Pennsylvania State University in 2002. Her research mainly focuses on multi-modality sensor fusion and cyber physical systems
Exploring millimeter wave radar data as complementary to RGB images for ameliorating 3D object detection has become an emerging trend for autonomous driving systems. However, existing radar-camera fusion methods are highly dependent on the prior camera detection results, rendering the overall performance unsatisfactory. In this paper, we propose a bidirectional fusion scheme in the bird-eye view (BEV-radar), which is independent of prior camera detection results. Leveraging features from both modalities, our method designs a bidirectional attention-based fusion strategy. Specifically, following BEV-based 3D detection methods, our method engages a bidirectional transformer to embed information from both modalities and enforces the local spatial relationship according to subsequent convolution blocks. After embedding the features, the BEV features are decoded in the 3D object prediction head. We evaluate our method on the nuScenes dataset, achieving 48.2 mAP and 57.6 NDS. The result shows considerable improvements compared to the camera-only baseline, especially in terms of velocity prediction. The code is available at https://github.com/Etah0409/BEV-Radar.
Graphical Abstract
BEV-radar simplifies 3D object detection by aligning camera and radar features in a bird-eye view (BEV) perspective, enhancing fusion through a bidirectional query-based transformer approach for complementary information exchange.
Abstract
Exploring millimeter wave radar data as complementary to RGB images for ameliorating 3D object detection has become an emerging trend for autonomous driving systems. However, existing radar-camera fusion methods are highly dependent on the prior camera detection results, rendering the overall performance unsatisfactory. In this paper, we propose a bidirectional fusion scheme in the bird-eye view (BEV-radar), which is independent of prior camera detection results. Leveraging features from both modalities, our method designs a bidirectional attention-based fusion strategy. Specifically, following BEV-based 3D detection methods, our method engages a bidirectional transformer to embed information from both modalities and enforces the local spatial relationship according to subsequent convolution blocks. After embedding the features, the BEV features are decoded in the 3D object prediction head. We evaluate our method on the nuScenes dataset, achieving 48.2 mAP and 57.6 NDS. The result shows considerable improvements compared to the camera-only baseline, especially in terms of velocity prediction. The code is available at https://github.com/Etah0409/BEV-Radar.
Public Summary
BEV-radar easily align the multi-modality features adaptively, which is more suitable for fusion of radar and camera.
Bidirectional spatial fusion module make the features’ representations from different domains towards unification.
BEV-radar performs effectively on velocity prediction and reduces 53% error compared to camera-only model.
Synchrotron radiation light sources have become important devices for scientific research. As a main part of the accelerator radio frequency (RF) system, which provides energy to electrons, superconducting cavities have been widely used in the last 50 years because of their low ohmic power loss[1, 2]. The SOLEIL-type cavity was named because it was first used in the 3rd generation light source SOLEIL[3]. Two 352.2 MHz SOLEIL-type superconducting modulers were installed on its storage ring in 2006 and 2008. They provided an accelerating voltage of 3 MV (750 kV/cell) and satisfied the power requirement for a beam current of 500 mA at a beam energy of 2.75 GeV[4]. Currently, SOLEIL cavities have the potential to operate under 500 mA beam current using a single crymodule with some improvements in their solid-state power amplifiers (SSPAs)[5].
The Hefei Advanced Light Facility (HALF), a large-scale scientific apparatus in the Fourteenth Five-Year Plan of China, is a soft X-ray and VUV light source based on a diffraction-limited storage ring (DLSR), which has a beam energy of 2.2 GeV and an emittance goal of less than 100 pm·rad. It is now under construction at the National Synchrotron Radiation Laboratory (NSRL). Based on KEK-B[6] and BEPC-II[7], a KEK-type superconducting cavity has been developed and will be used in HALF to meet the 350 mA beam current requirement[8]. In the future, with the increasement of beam current and number of insert parts, exploring SOLEIL-type superconducting cavities, which have two fundamental power couplers, may be meaningful. The linear section length of a 499.8 MHz SOLEIL module with two fundamental power couplers may be shorter than that of the two KEK modules because a higher power level could be easily realized.
All modes except the fundamental mode that exist in the cavity are called higher-order modes (HOMs). HOMs are excited when relativistic electrons passing through the cavity and must be extracted efficiently to prevent coupled bunch instability (CBI). HOM damping concepts may vary but are principally based on coaxial and waveguide couplers as well as beam line absorbers or any combination[9]. SOLEIL-type superconducting cavities were equipped with coaxial HOM couplers on their center pipe to reduce the linear section length and to have a moderate opening at the end beam pipes. Four HOM couplers were mounted on a 352 MHz SOLEIL cavity to extract the HOMs. Six HOM couplers were used to meet the HOM damping requirement of a 1500 MHz SUPER3HC 3rd harmonic cavity[10].
In this paper, the finite element analysis software CST Studio Suite was used to complete the research. First, the HOM properties of a bare 499.8 MHz SOLEIL-type superconducting cavity were analyzed. Second, coaxial HOM couplers were assumed to act as hooks to optimize their quantity and location, which has a great influence on the multipacting analysis. Then, the RF transmission properties and multipacting properties of the HOM couplers were optimized. Fourth, detailed HOM couplers were added to the model for the confirmation of the design and for overall optimization. Finally, a primary thermal simulation was presented to explore the cooling requirements of HOM couplers.
2.
Materials and methods
2.1
Cavity analysis
A 499.8 MHz SOLEIL-type superconducting cavity was analyzed by CST Suite Studio based on the 352 MHz SOLEIL cavity. The main parameters are listed in Table 1.
Table
1.
Main parameters for the 499.8 MHz SOLEIL-type superconducting cavity.
As shown in Fig. 1, the SOLEIL-type cavity have cells linked with a large beam pipe in between, called the center beam pipe (CBP), and terminated on smaller side beam pipes (SBPs). The radius of the CBP was specially designed to ensure that all the HOMs could be extracted into the CBP while the fundamental mode was still localized in cavity cells. When HOM couplers were assembled on CBP, this structure provided strong coupling to HOMs and very weak coupling to the fundamental mode.
Figure
1.
A 499.8 MHz SOLEIL-type superconducting cavity.
Theoretically, HOMs between the fundamental mode frequency f0 and the cutoff frequency of the SBP fc,SBP should be extracted by coaxial HOM couplers on the CBP. HOMs with frequencies above fc,SBP were assumed to be absorbed at the beam pipe absorbers or lossy metal tapers assembled outside the crymodule. In our case, as the radius of the SBP was set to 91.5 mm, the cutoff frequency was calculated to be 965.95 MHz in TE11 and 1261.87 MHz in TM01. Therefore, HOMs with frequencies between 499.8 MHz and 1261.87 MHz are of particular concern.
To find HOMs that may have a strong effect on the beam, the wake impedance is an accurate and convenient approach[11, 12]. It can be calculated by 2D software such as ABCI or 3D software such as the wakefield solver in CST Particle Studio. Using the cavity model in Fig. 1, two SBP ports were set as waveguide ports with five modes. The beam was set at a 1 mm bias from the cavity center, parallel to the red line in Fig. 1, and had a 34 mm bunch length. The wakelength was set to 800 m. The wake impedance was calculated and is shown in Fig. 2. The peak at approximately 499.8 MHz represented the fundamental mode, and the other peaks represented HOMs that may be dangerous. For HOMs that have a higher frequency, the electromagnetic field modes in the cavity cell and CBP may be different. They were named EM (frequency in MHz). For example, mode EM(716) has a field similar to that of TM111 in cavity cells, while the distribution in CBP is TE112. All HOM impedances were notably high and will be further increased with increasing simulation wakelength. This phenomenon can be attributed to the absence of HOM couplers in the model depicted in Fig. 1, which allows the HOMs to sustain resonance within the cavity.
Figure
2.
Wake impedance calculation results for a bare 499.8 MHz SOLEIL-type superconducting cavity: (a) longitudinal and (b) transverse.
According to the coupling method, coaxial-type HOM couplers can be divided into two categories, probe couplers or loop couplers, using electrical coupling or magnetic coupling, respectively[13]. Since the loop coupler could be designed to be dismountable and sometimes easier to use to form a filter structure, it was chosen as the basis for our HOM couplers.
2.2.1
Impedance threshold
The wake impedance threshold to cause CBI should not be exceed both longitudinal and transverse. It can be analytically calculated with the widely used Eqs. (1) and (2)[3, 9].
fHOMZth∥=2QsEατ∥I0,
(1)
f0Zth⊥=2Eβ⊥τ⊥I0,
(2)
where fHOM and f0 are the HOM resonance frequency and revolution frequency, E and I0 are the beam energy and average current, Qs is the synchrotron tune, α is the momentum compaction factor, β⊥ is the β function at the cavity location in the x or y direction, and τ∥ and τ⊥ are the longitudinal and transverse radiation damping times, respectively. These formulas were obtained by equating the radiation damping time with the respective instability rise time.
2.2.2
Coupler quantity and location
To achieve strong coupling, the plane of the hook should be as perpendicular as possible to the magnetic field line and close to the strong field. As shown in Fig. 3, the magnetic field of transverse HOMs such as TE112 tends to localize around the cavity, and longitudinal HOMs such as TM015 have a strong field around the center of the CBP. In our prediction, two kinds of couplers were needed: T-type couplers with hooks perpendicular to the beam axis and close to the cells, as shown in Fig. 3a, and L-type couplers parallel to the beam axis and mounted around the center of the CBP, as shown in Fig. 3b, which handle transverse and longitudinal HOMs, respectively.
Figure
3.
HOMs magnetic field distribution: (a) TE112 distribution, (b) TM015 distribution.
To estimate the quantity of HOM couplers and their location, couplers were assumed to be hooks added to the cavity model. Otherwise, the large mesh number and unacceptable calculation time will make the optimization impossible. A model with 4 T-type and 2 L-type couplers is shown in Fig. 4; L-type couplers are located around the center, and T-type couplers are located near the cavity cell. Couplers with different orientations were considered to absorb transverse HOMs with different polarizations.
Figure
4.
Cavity model with the coupling hooks, (a) the hooks, (b) the port setting.
We used the CST wakefield solver to optimize the model. During the optimization, the number of couplers and the distance between the couplers and cavity were varied to obtain a smaller impedance. All coupler ports were set as waveguide ports with TEM mode, and two SBP ports were set as waveguide ports with five modes. The beam was set to a 1 mm offset from the axis, and a long simulation wakelength of 1000 m was used to ensure that all the HOMs were extracted. A beam length of 34 mm was chosen to calculate the impedance accurately. Finally, 4 T-type couplers were symmetrically localized 137.5 mm from the cells and rotated 90 degrees on one side. At least 2 L-type couplers should be located around the center, as shown in Fig. 4, 362.5 mm from the cells.
The wake impedance results for the optimized model are shown in Fig. 5. The red lines are the impedance thresholds calculated by Eqs. (1) and (2) with the HALF parameters. The longitudinal impedance was well suppressed, and the maximum transverse impedance remained approximately 20 kΩ/m below the threshold. In this phase, a negative effect on the transverse impedance caused by the asymmetry of the L-type couplers was discovered but was considered acceptable for reducing the number of couplers.
Figure
5.
Wake impedance of the optimized model: (a) longitudinal and (b) transverse.
The coaxial line does not have the same cutoff property as the wakeguide, so coaxial HOM couplers need a filter structure at the fundamental mode frequency to prevent coupling of the fundamental mode, which in our case is 499.8 MHz. Filter structures can be realized by folding coaxial lines or adding extra rods or plates[14, 15]. The optimized equivalent circuits and models of our L-type and T-type couplers are shown in Fig. 6.
Figure
6.
Equivalent circuits and the optimized model, (a) L-type coupler, (b) T-type coupler.
Both couplers have one filter structure: a capacitor plate around the support rod for L-type couplers and a lengthened hook facing the outer conductor for T-type couplers, as shown in the red boxes of Fig. 6a and b, respectively. Since T-type couplers are close to the cavity cell, a filter structure around the coupler hook may block the fundamental mode from entering the coupler, reducing extra heat loss. In the L-type coupler case, the fundamental mode fields were weak around the coupler, and the lengthened hook did not work as a filter after 90 degrees of rotation, so the hook was shortened, and a capacitor plate was added around the support rod.
The transport properties of these couplers were analyzed by beampipe-coupler models using CST microwave (MW) studio. As shown in the left part of Fig. 7, the TE11 or TM01 mode was launched at the side of the beam pipe, where it was set as a waveguide port named port1 or port3. Port2 was set to the TEM mode. The S21 results are shown in the right part of Fig. 7. Since the fundamental mode TM010 excites the TM01 mode at the beam pipe port, the TM01 mode S21 should receive special attention. After the optimization of the coupler geometry, both couplers exhibited good reflection properties—over −80 dB for L-type couplers and −75 dB for T-type couplers—at 499.8 MHz.
Figure
7.
Beampipe-coupler models and S21 results for the (a) L-type coupler and (b) T-type coupler.
Longitudinal HOMs were extracted from L-type couplers. As shown by the blue line in Fig. 2, the longitudinal HOMs were located between 0.80 GHz and 1.26 GHz, with the strongest occurring at approximately 1.00 GHz. The TM01 launched S21 for the L-type coupler in this range was mostly above –20 dB, which predicted good transmission. On the other hand, the transverse HOMs were located between 0.55 GHz and 1.10 GHz, and the T-type couplers also exhibited good transmission properties in two polarization directions.
2.4
Multipacting analysis of HOM couplers
Multipacting (MP) is a phenomenon caused by electrons that resonate with electromagnetic fields. When some primary electrons gain energy from the field and collide with conductors, secondary electrons may be generated. If secondary electrons act similar to primary electrons and form more secondary electrons, the number of electrons will increase exponentially, and then the MP will occur. MP results in extra heat loss, quench of the superconducting, or even breakdown of HOM couplers.
The particle in cell (PIC) solver in CST particle studio was used for MP analysis because it has several advantages, such as convenient particle settings and better computer memory processing compared with the tracking solver, while maintaining comparable accuracy[16]. The electromagnetic field was calculated by a model including the cavity and HOM couplers using the eigenmode solver of CST MW Studio and then input into the PIC solver. The model and electromagnetic field are presented in Fig. 8. Since the fundamental mode is below the cutoff frequency of the CBP, fields were localized inside cavity cells. The brown parts in Fig. 8 were alumina ceramics.
Figure
8.
Electromagnetic field model, (a) cavity model with couplers, (b) fundamental mode electric field.
When the cavity is in operation, the electromagnetic field in the HOM couplers could be as strong as the field in cavity cells. Thus, niobium was chosen as the material for the inner conductor of HOM couplers, enabling the formation of a superconductive state, as illustrated in the blue parts of Fig. 9a.
Figure
9.
MP analysis model and true SEY used: (a) model and (b) SEY curve.
In the PIC solver, the Furman emission model was used to simulate the MP effect. The backscattered electrons, rediffused electrons and true electrons were all considered in this model. A niobium shell must be added outside the cavity to make the cavity boundary reflectable, as shown in Fig. 9a. The secondary emission yield (SEY) of niobium and copper was obtained from the offset in CST particle studio, as shown in Fig. 9b.
2.4.1
T-type coupler multipacting analysis
Close to the cavity, the T-type couplers were analyzed first. All the inner conductor surfaces were set as particle sources, as shown in Fig. 10a. Over 2000 primary electron launch points were uniformly distributed on the surface. The electron launch angle was set to a random range from 0° to 89.9° with an initial energy of 2 eV. In the time domain, the launch current was set as a 2 ns rectangular pulse, covering the first period of the fundamental mode resonance. Each calculation was run for 15 ns, and the space-charge effect was neglected.
Figure
10.
MP analysis for T-type couplers: (a) particle source area, (b) particle monitoring where MPs occurred, and (c) <SEY> results.
The time-averaged secondary emission yield (<SEY>) was used to indicate whether MP occurred, and it could be calculated by Eq. (3):
<SEY>=<Ie><Ic>,
(3)
where <Ie> and <Ic> represent the time-averaged emission current and collision current, respectively, calculated by electrons emitted from or colliding with the entire conductor surface and averaged from the simulation time of 4 ns to 15 ns. When <SEY> is greater than 1, the emission current is greater than the collision current, which always represents a high possibility that MP may occur.
Initially, the materials of the upper coaxial line and the hook were set as copper and niobium, respectively, but MP occurred at an acceleration gradient of approximately 4 MV/m. Particle monitor revealed that the MP was located at the capacity part, as shown in the upper panel of Fig. 10b. After the upper part was replaced by niobium, the MP was suppressed well between the 2 MV/m and 16 MV/m acceleration gradients.
2.4.2
L-type coupler multipacting analysis
Contrary to what we predicted, L-type couplers have a weak field strength inside but are more likely to experience MP. Similar to T-type couplers, the entire inner conductor surface was set as a particle source, as shown in Fig. 11a. All primary electron settings and the <SEY> calculations were consistent with the T-type coupler analysis.
Figure
11.
MP analysis for L-type couplers: (a) particle source area, (b) particle monitoring where MPs occurred, and (c) <SEY> results.
After optimization, MP still occurred over 10 MV/m at the capacity plate facing the outer conductor, as shown in the upper panel of Fig. 11b. An inner conductor capacity plate might also lead to an MP under a higher acceleration gradient. As a storage ring cavity, the acceleration gradient was always set between 2 MV/m and 6 MV/m, rarely more than 8 MV/m. In this range, optimized L-type couplers had a low probability of having an MP effect. The HALF required a 1.5 MV RF voltage, and the acceleration gradient could be calculated to be 2.48 MV/m per cell using a 499.8 MHz SOLEIL-type cavity. It could be predicted that no multipacting will occur under an operating acceleration gradient in L-type or T-type HOM couplers.
3.
Results and discussion
3.1
Wake impedance recalculation
After RF and MP analysis, the structure of HOM couplers is finally determined. To ensure that these couplers could dampen HOMs efficiently, the wake impedance was recalculated with a model including detailed HOM couplers and the cavity, as shown in Fig. 12a.
Figure
12.
The cavity model and its impedances: (a) model with 4 T-type and 2 L-type couplers, (b) longitudinal impedance, and (c) transverse impedance.
In this model, the coupling hooks were replaced by 4 T-type couplers and 2 L-type couplers in detail. The beam length was set to 50 mm after a tradeoff between accuracy and calculation time, and the beam position was set to 1 mm biased from the beam axis. To ensure that all the HOMs with frequencies less than 1.3 GHz could be extracted, two SBP ports were set as waveguide ports with five modes, and all the HOM coupler ports were set as waveguide ports with three modes. The impedance results are presented in Fig. 12b and c, and the longitudinal impedances were well damped. Unfortunately, the transverse impedance greatly exceeds the threshold.
To ensure that the transverse impedance is below the threshold, the asymmetry effect of L-type couplers was considered. Two extra L-type couplers were added opposite to the former L-type couplers shown in Fig. 13a, all of which are located 362.5 mm from the cavity. During the calculations, all beam parameter settings were inherited from the previous model. The addition of two L-type couplers resulted in a significant reduction in both longitudinal and transverse impedance, as shown in Fig. 13b and c.
Figure
13.
The cavity model and its impedances: (a) model with 4 T-type and 4 L-type couplers, (b) longitudinal impedance, and (c) transverse impedance.
Finally, a preliminary thermal analysis of HOM couplers was performed. Using CST Microwave Studio and Mphsics Studio, a bidirectional electromagnetic-thermal calculation was studied. The model is presented in Fig. 14a. Since the cells of the SOLEIL-type cavity were completely immersed in 4.2 K liquid helium and the CBP part was in vacuum. The cavity cells were assumed well cooled and simplified with two 4.2 K thermal anchors on both sides of the CBP[3, 4]. T-type couplers were also assumed to be cooled by 4.2 K liquid helium because they were close to the cavity cell. Additionally, a 5 K thermal anchor was added at the top of each HOM coupler for thermal calculations. The 4.2 K and 5 K thermal anchors are shown in blue and red, respectively, in Fig. 14a. The thickness of the CBP niobium wall was 4 mm. The temperature-dependent heat conductivity and electrical conductivity of niobium were obtained from Fermilab documents[17]. The surface resistance of the superconductive niobium was set to 20 nΩ at 4.2 K and 1 GHz, with a residual resistance of 10 nΩ. This value exhibits variations with both temperature and frequency according to the following formula[2]:
Figure
14.
Thermal analysis: (a) boundary conditions, (b) temperature distribution under a 10 MV/m acceleration gradient, and (c) temperature distribution under a 3 MV/m acceleration gradient and 2 kW HOM power.
where A is a constant factor, T is the temperature, f is the frequency, Δ is the half-energy gap of niobium, k is the Boltzmann constant and R0 is the residual resistance. Other materials, such as ceramics, were obtained from the CST library as a default.
In this model, the TM01 mode waveguide ports on both sides of the CBP represented the power launched by the TM010 mode in the cavity cells, as shown in Fig. 14a. The initial temperature distribution was calculated without RF power and input into microwave studio. Thereafter, new temperature distribution was computed based on the RF losses derived from the initial distribution. This iterative process was repeated until stable results were achieved. Two simulation conditions are presented as follows.
Since MP may not appear until a 10 MV/m acceleration gradient is reached, the electromagnetic fields were scaled to 10 MV/m at 499.8 MHz for thermal analysis first. As shown in Fig. 14b, the hook parts of all HOM couplers remained superconducting. The maximum temperature was 8.3 K at the ceram part of the T-type couplers. With such a small temperature increase, there should be no thermal breakdown until MP occurs.
The second condition was established to simulate the temperature distribution under an acceleration gradient of 3 MV/m, taking HOM power into account, assuming the module was in operation. The HOM power could be calculated by the loss factor:
PHOM=(k||−kFM)Ib2Nfrev,
(5)
where k|| is the total loss factor and kFM is the loss factor for the fundamental mode. Ib is the beam current, N is the number of bunches, and frev is the revolution frequency. The loss factor for the fundamental mode kFM can be calculated by Eq. (6).
kFM=ωFM4(RQ)FMexp(−ω2FMσ2zc2),
(6)
where ωFM and (R/Q)FM is the angular frequency and the R over Q value of the fundamental mode, σz is the longitudinal beam length and c is the speed of light.
Using the model shown in Fig. 1, when the beam length was 5 mm, the total loss factor and the fundamental mode loss factor was computed to be 0.6 V/pC and 0.14 V/pC, respectively. With the parameters of HALF, the HOM power could be predicted to be less than 1 kW. However, the tapers and beam pipe HOM absorbers outside the model were proven to be important contributors to losses at short beam lengths[18]. Consequently, we assumed a total HOM power of 2 kW. On each side of the CBP, the 499.8 MHz and 1 GHz fields were scaled to fit a 3 MV/m gradient and 1 kW HOM power, respectively. The simulation results are shown in Fig. 14c. The maximum temperature was 5.2 K. The temperature distribution of the L-type couplers was nearly identical to that of under 10 MV/m gradient. While the T-type couplers had a lower temperature related to the decrease in the acceleration gradient. As a summary, the HOM power has minor influence on the temperature distribution. There are no risks of thermal breakdown under this condition.
4.
Conclusions
SOLEIL-type superconduting cavities have two fundamental power couplers, which may easily provide a higher power level. Two kinds of coaxial HOM couplers were designed for a 499.8 MHz SOLEIL-type superconducting cavity. The geometry, quantity, and location of these couplers were optimized. The RF transfer characteristics and MP properties were good until 10 MV/m was achieved. The HOM damping requirement of HALF can be greatly satisfied using 4 L-type and 4 T-type couplers. A preliminary thermal analysis showed that there should be no thermal breakdown until MPs occur. This work will provide preliminary research for a 499.8 MHz SOLEIL-type superconducting cavity.
Conflict of interest
The authors declare that they have no conflict of interest.
BEV-radar easily align the multi-modality features adaptively, which is more suitable for fusion of radar and camera.
Bidirectional spatial fusion module make the features’ representations from different domains towards unification.
BEV-radar performs effectively on velocity prediction and reduces 53% error compared to camera-only model.
Vora S, Lang A H, Helou B, et al. PointPainting: Sequential fusion for 3D object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020: 4603–4611.
[2]
Long Y, Morris D, Liu X, et al. Radar-camera pixel depth association for depth completion. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021: 12502–12511.
[3]
Nobis F, Geisslinger M, Weber M, et al. A deep learning-based radar and camera sensor fusion architecture for object detection. In: 2019 Sensor Data Fusion: Trends, Solutions, Applications (SDF). Bonn, Germany: IEEE, 2019: 1–7.
[4]
Liu Z, Tang H, Amini A, et al. BEVFusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. arXiv: 2205.13542, 2022.
[5]
Huang J, Huang G, Zhu Z, et al. BEVDet: High-performance multi-camera 3D object detection in bird-eye-view. arXiv: 2112.11790, 2021.
[6]
Xu B, Chen Z. Multi-level fusion based 3D object detection from monocular images. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 2345–2353.
[7]
Kundu A, Li Y, Rehg J M. 3D-RCNN: Instance-level 3D object reconstruction via render-and-compare. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 3559–3568.
[8]
You Y, Wang Y, Chao W L, et al. Pseudo-LiDAR++: Accurate depth for 3D object detection in autonomous driving. In: Eighth International Conference on Learning Representations, 2020.
[9]
Wang Y, Chao W L, Garg D, et al. Pseudo-LiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2020: 8437–8445.
[10]
Roddick T, Kendall A, Cipolla R. Orthographic feature transform for monocular 3D object detection. arXiv: 1811.08188, 2018.
[11]
Wang T, Zhu X, Pang J, et al. FCOS3D: Fully convolutional one-stage monocular 3D object detection. In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Montreal, Canada: IEEE, 2021: 913–922.
[12]
Li Z, Wang W, Li H, et al. BEVFormer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In: Avidan S, Brostow G, Cissé M, et al. Editors. Computer Vision–ECCV 2022. Cham: Springer, 2022: 1–18.
[13]
Wang Y, Guizilini V, Zhang T, et al. DETR3D: 3D object detection from multi-view images via 3D-to-2D queries. In: 5th Conference on Robot Learning(CoRL 2021). London, UK: CoRL, 2021: 1–12.
[14]
Chen X, Ma H, Wan J, et al. Multi-view 3D object detection network for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017: 6526–6534.
[15]
Qi C R, Liu W, Wu C, et al. Frustum PointNets for 3D object detection from RGB-D data. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 918–927.
[16]
Qian K, Zhu S, Zhang X, et al. Robust multimodal vehicle detection in foggy weather using complementary LiDAR and radar signals. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021: 444–453.
[17]
Chadwick S, Maddern W, Newman P. Distant vehicle detection using radar and vision. In: 2019 International Conference on Robotics and Automation (ICRA). Montreal, Canada: IEEE, 2019: 8311–8317.
[18]
Cheng Y, Xu H, Liu Y. Robust small object detection on the water surface through fusion of camera and millimeter wave radar. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2022: 15243–15252.
[19]
Bai X, Hu Z, Zhu X, et al. TransFusion: robust LiDAR-camera fusion for 3D object detection with transformers. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022: 1080–1089.
[20]
Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021: 9992–10002.
[21]
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc., 2017: 6000–6010.
[22]
Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers. In: Vedaldi A, Bischof H, Brox T, et al. editors. Computer Vision–ECCV 2020. Cham: Springer, 2020: 213–229.
[23]
Zhang R, Qiu H, Wang T, et al. MonoDETR: Depth-guided transformer for monocular 3D object detection. arXiv: 2203.13310, 2022.
[24]
Zhu X, Su W, Lu L, et al. Deformable DETR: Deformable transformers for end-to-end object detection. arXiv: 2010.04159, 2020.
[25]
Lang A H, Vora S, Caesar H, et al. PointPillars: fast encoders for object detection from point clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019: 12689–12697.
[26]
Kuhn H W. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly,1955, 2: 83–97. DOI: https://doi.org/10.1002/nav.3800020109
[27]
MMDetection3D Contributors. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d, 2020. Accessed December 1, 2022.
[28]
Xie E, Yu Z, Zhou D, et al. M2BEV: Multi-camera joint 3D detection and segmentation with unified birds-eye view representation. arXiv: 2204.05088, 2022.
[29]
Yan Y, Mao Y, Li B. SECOND: Sparsely embedded convolutional detection. Sensors,2018, 18 (10): 3337. DOI: 10.3390/s18103337
[30]
Yin T, Zhou X, Krähenbühl P. Center-based 3D object detection and tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021: 11779–11788
[31]
Wang T, Zhu X, Pang J, et al. Probabilistic and geometric depth: Detecting objects in perspective. In: Proceedings of the 5th Conference on Robot Learning. PMLR, 2022, 164: 1475–1485.
[32]
Duan K, Bai S, Xie L, et al. CenterNet: keypoint triplets for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea: IEEE, 2020: 6568–6577.
[33]
Nabati R, Qi H. CenterFusion: center-based radar and camera fusion for 3D object detection. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). Waikoloa, HI, USA: IEEE, 2021: 1526–1535.
[34]
Kim Y, Kim S, Choi J W, et al. CRAFT: Camera-radar 3D object detection with spatio-contextual fusion transformer. arXiv: 2209.06535, 2022.
Figure
1.
Comparison between the two alignment methods. (a) Radar fusion methods relying on the first stage proposals: after generating the initial proposals, association methods to their corresponding radar regions is necessary, leading to ignoring of objects which are not detected in the first stage. (b) Our adaptive radar fusion view: instead of aligning proposals from the first stage, features are directly aligned in BEV, thus prediction is guided by multi-modality features.
Figure
2.
Overall architecture of framework. Our model is constructed on separate backbones to extract the image BEV features and the radar BEV features. Our BSF (bidirectional spatial fusion) blocks consist of several blocks sequentially: First, a shared bidirectional cross-attention for communicating both modalities. Spatial alignment is followed to localize the radar and camera BEV features. After all blocks, both outputs will be sent in a deconvolution module to descend the channel.
Figure
3.
In the first row, base prediction and fusion prediction separately represent the camera-only model and the radar-fusion model on BEV. The ground truth is plotted as blue boxes and the prediction boxes are plotted as yellow boxes, with lighter colors indicating higher confidence scores. The bottom row shows a visualization of the camera views for this frame, with the corresponding regions of interest marked by dashed boxes of the same color.
Figure
4.
Qualitative analysis of detection results. 3D bounding box predictions are projected onto images from six different views and BEV respectively. Boxes from different categories are marked with different colors and without ground truth. For BEV visualization, yellow means predicted boxes and blue ones are ground-truth, while LiDAR points are visualized as background.
References
[1]
Vora S, Lang A H, Helou B, et al. PointPainting: Sequential fusion for 3D object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020: 4603–4611.
[2]
Long Y, Morris D, Liu X, et al. Radar-camera pixel depth association for depth completion. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021: 12502–12511.
[3]
Nobis F, Geisslinger M, Weber M, et al. A deep learning-based radar and camera sensor fusion architecture for object detection. In: 2019 Sensor Data Fusion: Trends, Solutions, Applications (SDF). Bonn, Germany: IEEE, 2019: 1–7.
[4]
Liu Z, Tang H, Amini A, et al. BEVFusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. arXiv: 2205.13542, 2022.
[5]
Huang J, Huang G, Zhu Z, et al. BEVDet: High-performance multi-camera 3D object detection in bird-eye-view. arXiv: 2112.11790, 2021.
[6]
Xu B, Chen Z. Multi-level fusion based 3D object detection from monocular images. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 2345–2353.
[7]
Kundu A, Li Y, Rehg J M. 3D-RCNN: Instance-level 3D object reconstruction via render-and-compare. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 3559–3568.
[8]
You Y, Wang Y, Chao W L, et al. Pseudo-LiDAR++: Accurate depth for 3D object detection in autonomous driving. In: Eighth International Conference on Learning Representations, 2020.
[9]
Wang Y, Chao W L, Garg D, et al. Pseudo-LiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2020: 8437–8445.
[10]
Roddick T, Kendall A, Cipolla R. Orthographic feature transform for monocular 3D object detection. arXiv: 1811.08188, 2018.
[11]
Wang T, Zhu X, Pang J, et al. FCOS3D: Fully convolutional one-stage monocular 3D object detection. In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Montreal, Canada: IEEE, 2021: 913–922.
[12]
Li Z, Wang W, Li H, et al. BEVFormer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In: Avidan S, Brostow G, Cissé M, et al. Editors. Computer Vision–ECCV 2022. Cham: Springer, 2022: 1–18.
[13]
Wang Y, Guizilini V, Zhang T, et al. DETR3D: 3D object detection from multi-view images via 3D-to-2D queries. In: 5th Conference on Robot Learning(CoRL 2021). London, UK: CoRL, 2021: 1–12.
[14]
Chen X, Ma H, Wan J, et al. Multi-view 3D object detection network for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017: 6526–6534.
[15]
Qi C R, Liu W, Wu C, et al. Frustum PointNets for 3D object detection from RGB-D data. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 918–927.
[16]
Qian K, Zhu S, Zhang X, et al. Robust multimodal vehicle detection in foggy weather using complementary LiDAR and radar signals. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021: 444–453.
[17]
Chadwick S, Maddern W, Newman P. Distant vehicle detection using radar and vision. In: 2019 International Conference on Robotics and Automation (ICRA). Montreal, Canada: IEEE, 2019: 8311–8317.
[18]
Cheng Y, Xu H, Liu Y. Robust small object detection on the water surface through fusion of camera and millimeter wave radar. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2022: 15243–15252.
[19]
Bai X, Hu Z, Zhu X, et al. TransFusion: robust LiDAR-camera fusion for 3D object detection with transformers. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022: 1080–1089.
[20]
Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021: 9992–10002.
[21]
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc., 2017: 6000–6010.
[22]
Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers. In: Vedaldi A, Bischof H, Brox T, et al. editors. Computer Vision–ECCV 2020. Cham: Springer, 2020: 213–229.
[23]
Zhang R, Qiu H, Wang T, et al. MonoDETR: Depth-guided transformer for monocular 3D object detection. arXiv: 2203.13310, 2022.
[24]
Zhu X, Su W, Lu L, et al. Deformable DETR: Deformable transformers for end-to-end object detection. arXiv: 2010.04159, 2020.
[25]
Lang A H, Vora S, Caesar H, et al. PointPillars: fast encoders for object detection from point clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019: 12689–12697.
[26]
Kuhn H W. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly,1955, 2: 83–97. DOI: https://doi.org/10.1002/nav.3800020109
[27]
MMDetection3D Contributors. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d, 2020. Accessed December 1, 2022.
[28]
Xie E, Yu Z, Zhou D, et al. M2BEV: Multi-camera joint 3D detection and segmentation with unified birds-eye view representation. arXiv: 2204.05088, 2022.
[29]
Yan Y, Mao Y, Li B. SECOND: Sparsely embedded convolutional detection. Sensors,2018, 18 (10): 3337. DOI: 10.3390/s18103337
[30]
Yin T, Zhou X, Krähenbühl P. Center-based 3D object detection and tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021: 11779–11788
[31]
Wang T, Zhu X, Pang J, et al. Probabilistic and geometric depth: Detecting objects in perspective. In: Proceedings of the 5th Conference on Robot Learning. PMLR, 2022, 164: 1475–1485.
[32]
Duan K, Bai S, Xie L, et al. CenterNet: keypoint triplets for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea: IEEE, 2020: 6568–6577.
[33]
Nabati R, Qi H. CenterFusion: center-based radar and camera fusion for 3D object detection. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). Waikoloa, HI, USA: IEEE, 2021: 1526–1535.
[34]
Kim Y, Kim S, Choi J W, et al. CRAFT: Camera-radar 3D object detection with spatio-contextual fusion transformer. arXiv: 2209.06535, 2022.