DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction [
arxiv
]
Zhen Yang, Heng Wang, Yanpeng Dong
Beijing Mechanical Equipment Institute, Beijing, China
This is the official implementation of DAOcc. DAOcc is a novel multi-modal occupancy prediction framework that leverages 3D object detection to assist in achieving superior performance while using a deployment-friendly image encoder and practical input image resolution.
- 2025-09-09: DAOcc is accepted to TCSVT — cue the confetti! 🎉
- 2025-07-20: We have open-sourced the TensorRT inference code for DAOcc, achieving 54.25 mIoU at 104.9 FPS. Check it out here.
- 2025-07-11: DAOcc achieved 54.33 mIoU on Occ3D-nuScenes without EMA.
- 2025-04-24: Following SparseBEV, we optimized the 2D-to-3D image feature transformation process, achieving substantial reductions in GPU memory consumption while slightly reducing training time. Check the config file.
- 2025-01-31: Release the model weights and the first version of the code.
- 2024-10-01: Our preprint is available on arXiv.
3D Semantic Occupancy Prediction on Occ3D-nuScenes
Method | Camera Mask |
Image Backbone |
Image Resolution |
mIoU | Config | Model | Log |
---|---|---|---|---|---|---|---|
DAOcc | √ | R50 | 256×704 | 54.33 | config | model | log |
Method | Camera Mask |
Image Backbone |
Image Resolution |
RayIoU | Config | Model | Log |
---|---|---|---|---|---|---|---|
DAOcc | × | R50 | 256×704 | 48.4 | config | model | log |
Deprecated results (archived)
Method |
Camera Mask |
Image Backbone |
Image Resolution |
mIoU | Config | Model | Log |
---|---|---|---|---|---|---|---|
DAOcc | √ | R50 | 256×704 | 53.82 | config | model | log |
DAOcc* | √ | R50 | 256×704 | 54.19 | - | model | - |
Method | Camera Mask |
Image Backbone |
Image Resolution |
RayIoU | Config | Model | Log |
---|---|---|---|---|---|---|---|
DAOcc | × | R50 | 256×704 | 48.2 | config | model | log |
3D Semantic Occupancy Prediction on SurroundOcc
Method | Image Backbone |
Image Resolution |
IoU | mIoU | Config | Model | Log |
---|---|---|---|---|---|---|---|
DAOcc | R50 | 256×704 | 45.0 | 30.5 | config | model | log |
3D Semantic Occupancy Prediction on OpenOccupancy
Method | Image Backbone |
Image Resolution |
IoU | mIoU | Config | Model | Log |
---|---|---|---|---|---|---|---|
DAOcc | R18 | 256×704 | 32.2 | 24.1 | config | model | log |
3D Semantic Occupancy Prediction on Occ3D-Waymo
Method | Camera Mask |
Infov Mask |
Image Backbone |
Image Resolution |
mIoU | Config | Model | Log |
---|---|---|---|---|---|---|---|---|
DAOcc | √ | √ | R50 | 256×704 | 44.69 | config | - | log |
DAOcc* | √ | √ | R50 | 256×704 | 45.13 | - | - | - |
- The
*
means using exponential moving average (EMA) hook. - For Occ3D-Waymo, we use only 20% of the training data.
@article{yang2024daocc,
title={DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction},
author={Yang, Zhen and Dong, Yanpeng and Wang, Heng},
journal={arXiv preprint arXiv:2409.19972},
year={2024}
}
Many thanks to these excellent open-source projects: