PETRv1, PETRv2 is designed for multi-view 3D object detection task. PETR stands for "position embedding transformation" and encodes the position information into image features thus able to produce position-aware features. For more detail, please refer to:
- PETRv1: [ECCV2022] PETR: Position Embedding Transformation for Multi-View 3D Object Detection arxiv
- PETRv2: [ICCV2023] PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images arxiv
In this demo, we will use PETR-vov-p4-800x320 for PETRv1 and PETRv2-vov-p4-800x320 for PETRv2 as our deployment targets.
| Method | Backbone precision |
Head precision |
Framework | mAP | Latency(ms) |
|---|---|---|---|---|---|
| PETR-vov-p4-800x320 | fp16 | fp32 | PyTorch | 0.3778 | - |
| PETR-vov-p4-800x320 | fp16 | fp16 | TensorRT | 0.3774 | 52.37 (On Orin) |
| PETRv2-vov-p4-800x320 | fp16 | fp32 | PyTorch | 0.4106 | - |
| PETRv2-vov-p4-800x320 | fp16 | fp16 | TensorRT | 0.4101 | 55.77 (On Orin) |
cd /workspace
git clone https://github.com/NVIDIA/DL4AGX.git
cd DL4AGX
git submodule update --init --recursive
cd /workspace
git clone https://github.com/megvii-research/PETR
cd PETR
git apply /workspace/DL4AGX/AV-Solutions/petr-trt/patch.diff
git clone https://github.com/open-mmlab/mmdetection3d.git -b v0.17.1Please follow the instructions in the official repo (install.md, prepare-dataset.md) to setup the environment for PyTorch inference first. Since there are some api changes in newer mmcv/mmdet, we adjust the original configs. You may find those minor changes in patch.diff.
Then download PETR-vov-p4-800x320_epoch24.pth from https://drive.google.com/file/d/1-afU8MhAf92dneOIbhoVxl_b72IAWOEJ/view?usp=sharing
and PETRv2-vov-p4-800x320_epoch24.pth from https://drive.google.com/file/d/1tv_D8Ahp9tz5n4pFp4a64k-IrUZPu5Im/view?usp=sharing
to folder /workspace/PETR/ckpts. Note: these two files originally have the same name epoch_24.pth so don't forget to rename after download them.
After the setup, your PETR folder should looks like:
├── data/
│ └── nuscenes/
│ ├── v1.0-trainval/
│ ├── samples/
│ ├── sweeps/
│ ├── nuscenes_infos_train.pkl
│ ├── nuscenes_infos_val.pkl
│ ├── mmdet3d_nuscenes_30f_infos_train.pkl
│ └── mmdet3d_nuscenes_30f_infos_val.pkl
├── mmdetection3d/
├── projects/
├── tools/
├── install.md
├── requirements.txt
├── LICENSE
└── README.md
You may verify your installation with
cd /workspace/PETR
CUDA_VISIBLE_DEVICES=0 python tools/test.py /workspace/PETR/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py /workspace/PETR/ckpts/PETR-vov-p4-800x320_e24.pth --eval bboxThis command line is expected to output the benchmark results. This environment for PyTorch inference and benchmark will be referred to as torch container.
NOTE
- For the best user experience, we highly recommend use torch >= 1.14. You may also build the docker with given ./dockerfile. To build the docker, here is the example command line. You may change the argument for volume mapping according to your setup.
cd /workspace/DL4AGX/AV-Solutions/petr-trt docker build --network=host -f dockerfile . -t petr-trt docker run --name=petr-trt -d -it --rm --shm-size=4096m --privileged --gpus all -it --network=host \ -v /workspace:/workspace -v <path to nuscenes>:/data \ petr-trt /bin/bash
To setup the deployment environment, you may run the following commands. Please note that we will export the onnx inside petr-trt.
cd /workspace/DL4AGX/AV-Solutions/petr-trt/export_eval
ln -s /workspace/PETR/data data # create a soft-link to the data folder
ln -s /workspace/PETR/mmdetection3d mmdetection3d # create a soft-link to the mmdetection3d folder
export PYTHONPATH=.:/workspace/PETR/:/workspace/DL4AGX/AV-Solutions/petr-trt/export_eval/To export the ONNX of PETRv1
cd /workspace/DL4AGX/AV-Solutions/petr-trt/export_eval
python v1/v1_export_to_onnx.py /workspace/PETR/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py /workspace/PETR/ckpts/PETR-vov-p4-800x320_e24.pth --eval bboxThis script will create PETRv1.extract_feat.onnx and PETRv1.pts_bbox_head.forward.onnx inside onnx_files.
As PETRv2 is a temporal model, the inference behavior is slightly different from PETRv1.
Originally the backbone extract features from two input frames, i.e. the current and the previous frames.
However the feature extracted from the previous frame can be reused to improve efficiency.
So, we modify the behavior of function extract_feat when we export the model.
It will use cached feature map as input instead of recomputing them.
cd /workspace/DL4AGX/AV-Solutions/petr-trt/export_eval
python v2/v2_export_to_onnx.py /workspace/PETR/projects/configs/petrv2/petrv2_vovnet_gridmask_p4_800x320.py /workspace/PETR/ckpts/PETRv2-vov-p4-800x320_e24.pth --eval bboxThis script will create PETRv2.extract_feat.onnx and PETRv2.pts_bbox_head.forward.onnx inside onnx_files.
NOTE As
coords_position_embedingsolely depends onlidar2imgandimg_shapefromimg_metas, we move this part outside of the onnx. We can use the samecoords_position_embedingtensor iflidar2imgandimg_shaperemains unchange.
We provide v1/v1_evaluate_trt.py and v2/v2_evaluate_trt.py to run benchmark with TensorRT. It will produce similar result as the original benchmark with PyTorch.
- Prepare dependencies for benchmark:
pip install <TensorRT Root>/python/tensorrt-<version>-cp38-none-linux_aarch64.whl- Build TensorRT engine
We provide a script that will load and create engine files for the four simplified onnx files.
export TRT_ROOT=<path to your tensorrt dir>
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TRT_ROOT/lib
cd /workspace/DL4AGX/AV-Solutions/petr-trt/export_eval
bash onnx2trt.shThe above script builds TensorRT engines in FP16 precision as an example.
- Run benchmark with TensorRT
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TRT_ROOT/lib
cd /workspace/DL4AGX/AV-Solutions/petr-trt/export_eval
# benchmark PETRv1
python v1/v1_evaluate_trt.py /workspace/PETR/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py /workspace/PETR/ckpts/PETR-vov-p4-800x320_e24.pth --eval bbox
# benchmark PETRv2
python v2/v2_evaluate_trt.py /workspace/PETR/projects/configs/petrv2/petrv2_vovnet_gridmask_p4_800x320.py /workspace/PETR/ckpts/PETRv2-vov-p4-800x320_e24.pth --eval bboxAs we replace the backend from PyTorch to TensorRT while keeping other parts like data loading and evaluation unchanged, you are expected to see outputs similar to the PyTorch benchmark.
This model is to be deployed on NVIDIA DRIVE Orin with TensorRT 10.8.0.32.
We will use the following NVIDIA DRIVE docker image drive-agx-orin-linux-aarch64-pdk-build-x86:6.5.1.0-latest as the cross-compile environment, this container will be referred to as the build container.
To launch the docker on the host x86 machine, you may run:
docker run --gpus all -it --network=host --rm \
-v /workspace:/workspace \
nvcr.io/drive/driveos-sdk/drive-agx-orin-linux-aarch64-pdk-build-x86:6.5.1.0-latestTo gain access to the docker image and the corresponding TensorRT, please join the DRIVE AGX SDK Developer Program. You can find more details on NVIDIA DRIVE site.
Similar to what we did when building plugins, you may run the following commands inside the build container.
# inside cross-compile environment
cd /workspace/dl4agx/AV-Solutions/petr-trt/app
bash setup_dep.sh # download dependencies (stb, cuOSD)
mkdir -p build+orin && cd build+orin
cmake -DTARGET=aarch64 -DTRT_ROOT=<path to your aarch64 tensorrt dir> .. && makeWe expect to see petr_v1 and petr_v2 under petr-trt/app/build+orin/
In this demo run, we will setup everything under folder petr-trt/app/demo/.
- Copy cross-compiled application to demo folder
cd /workspace/dl4agx/AV-Solutions/petr-trt/
cp app/build+orin/petr* app/demo/- Prepare input data for inference
In the torch container environment on x86, run
cd /workspace/dl4agx/AV-Solutions/petr-trt/export_eval
python v1/v1_save_data.py /workspace/PETR/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py /workspace/PETR/ckpts/PETR-vov-p4-800x320_e24.pth --eval bbox
python v2/v2_save_data.py /workspace/PETR/projects/configs/petrv2/petrv2_vovnet_gridmask_p4_800x320.py /workspace/PETR/ckpts/PETRv2-vov-p4-800x320_e24.pth --eval bboxThis will dump necessary data files to petr-trt/export_eval/demo/data/. Please beware that v2_save_data.py will only generate necessary data files on top of v1_save_data.py. Make sure you call v1/v1_save_data.py first then v2/v2_save_data.py.
We can then move them by
cd /workspace/DL4AGX/AV-Solutions/petr-trt/
cp -r export_eval/demo/data/ app/demo/
cp -r export_eval/onnx_files/*.onnx app/demo/onnx_files/Now the petr-trt/app/demo folder should be organized as:
├── data/
│ ├── cams/
│ ├── imgs/
│ ├── lidar2imgs/
│ ├── v1_coords_pe.bin
│ ├── v2_coords_pe.bin
│ └── v2_mean_time_stamp.bin
├── engines/
├── onnx_files/
│ ├── PETRv1.extract_feat.onnx
│ ├── PETRv1.pts_bbox_head.forward.onnx
│ ├── PETRv2.extract_feat.onnx
│ └── PETRv2.pts_bbox_head.forward.onnx
├── viz_v1/
├── viz_v2/
├── onnx2trt.sh
├── simhei.ttf
├── v1_config.json
└── v2_config.json
Now you may copy or mount all the data under DL4AGX/AV-Solutions/petr-trt/app/demo to /demo folder on NVIDIA Drive Orin.
You may utilize trtexec to build the engine from the onnx files on NVIDIA Drive Orin. We provide a bash script that wraps trtexec commands.
export TRT_ROOT=<path to tensorrt on NVIDIA Drive Orin>
cd /demo
bash onnx2trt.shThis script will load all four onnx files under /demo/onnx_files and generate the corresponding engine files under /demo/engines/.
You may explore the script, and modify options like precision according to your needs.
To run the demo app, just simply call
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TRT_ROOT/lib
cd /demo
# to run petr v1
./petr_v1 ./v1_config.json
# to run petr v2
./petr_v2 ./v2_config.jsonThen you may find visualization result under /demo/viz_v1 and /demo/viz_v2 in jpg format.
Example (Left PETRv1, Right PETRv2):

- PETRv1&v2 and it's related code were licensed under Apache-2.0
- cuOSD and it's related code were licensed under MIT