|
| 1 | + |
| 2 | +## Quick start |
| 3 | + |
| 4 | +<details open> |
| 5 | +<summary>Setup</summary> |
| 6 | + |
| 7 | +```shell |
| 8 | + |
| 9 | +pip install -r requirements.txt |
| 10 | +``` |
| 11 | + |
| 12 | +The following is the corresponding `torch` and `torchvision` versions. |
| 13 | +`rtdetr` | `torch` | `torchvision` |
| 14 | +|---|---|---| |
| 15 | +| `-` | `2.2` | `0.17` | |
| 16 | +| `-` | `2.1` | `0.16` | |
| 17 | +| `-` | `2.0` | `0.15` | |
| 18 | + |
| 19 | +</details> |
| 20 | + |
| 21 | + |
| 22 | + |
| 23 | +## Model Zoo |
| 24 | + |
| 25 | +### Base models |
| 26 | + |
| 27 | +| Model | Dataset | Input Size | AP<sup>val</sup> | AP<sub>50</sub><sup>val</sup> | #Params(M) | FPS | config| checkpoint | |
| 28 | +| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: | |
| 29 | +**RT-DETRv2-S** | COCO | 640 | **47.9** <font color=green>(+1.4)</font> | **64.9** | 20 | 217 | [config](./configs/rtdetrv2/rtdetrv2_r18vd_120e_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_120e_coco.pth) | |
| 30 | +**RT-DETRv2-M** | COCO | 640 | **49.9** <font color=green>(+1.0)</font> | **67.5** | 31 | 161 | [config](./configs/rtdetrv2/rtdetrv2_r34vd_120e_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r34vd_120e_coco_ema.pth) |
| 31 | +**RT-DETRv2-M**<sup>*<sup> | COCO | 640 | **51.9** <font color=green>(+0.6)</font> | **69.9** | 36 | 145 | [config](./configs/rtdetrv2/rtdetrv2_r50vd_m_7x_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r50vd_m_7x_coco_ema.pth) |
| 32 | +**RT-DETRv2-L** | COCO | 640 | **53.4** <font color=green>(+0.3)</font> | **71.6** | 42 | 108 | [config](./configs/rtdetrv2/rtdetrv2_r50vd_6x_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r50vd_6x_coco_ema.pth) |
| 33 | +**RT-DETRv2-X** | COCO | 640 | 54.3 | **72.8** <font color=green>(+0.1)</font> | 76 | 74 | [config](./configs/rtdetrv2/rtdetrv2_r101vd_6x_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r101vd_6x_coco_from_paddle.pth) |
| 34 | +<!-- rtdetrv2_hgnetv2_l | COCO | 640 | 52.9 | 71.5 | 32 | 114 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_hgnetv2_l_6x_coco_from_paddle.pth) |
| 35 | +rtdetrv2_hgnetv2_x | COCO | 640 | 54.7 | 72.9 | 67 | 74 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_hgnetv2_x_6x_coco_from_paddle.pth) |
| 36 | +rtdetrv2_hgnetv2_h | COCO | 640 | 56.3 | 74.8 | 123 | 40 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_hgnetv2_h_6x_coco_from_paddle.pth) |
| 37 | +rtdetrv2_18vd | COCO+Objects365 | 640 | 49.0 | 66.5 | 20 | 217 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_5x_coco_objects365_from_paddle.pth) |
| 38 | +rtdetrv2_r50vd | COCO+Objects365 | 640 | 55.2 | 73.4 | 42 | 108 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r50vd_2x_coco_objects365_from_paddle.pth) |
| 39 | +rtdetrv2_r101vd | COCO+Objects365 | 640 | 56.2 | 74.5 | 76 | 74 | [url<sup>*</sup>](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r101vd_2x_coco_objects365_from_paddle.pth) |
| 40 | + --> |
| 41 | + |
| 42 | +**Notes:** |
| 43 | +- `AP` is evaluated on *MSCOCO val2017* dataset. |
| 44 | +- `FPS` is evaluated on a single T4 GPU with $batch\\_size = 1$, $fp16$, and $TensorRT>=8.5.1$. |
| 45 | +- `COCO + Objects365` in the table means finetuned model on `COCO` using pretrained weights trained on `Objects365`. |
| 46 | + |
| 47 | + |
| 48 | + |
| 49 | +### Models of discrete sampling |
| 50 | + |
| 51 | +| Model | Sampling Method | AP<sup>val</sup> | AP<sub>50</sub><sup>val</sup> | config| checkpoint |
| 52 | +| :---: | :---: | :---: | :---: | :---: | :---: | |
| 53 | +**RT-DETRv2-S_dsp** | discrete_sampling | 47.4 | 64.8 <font color=red>(-0.1)</font> | [config](./configs/rtdetrv2/rtdetrv2_r18vd_dsp_3x_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_dsp_3x_coco.pth) |
| 54 | +**RT-DETRv2-M_dsp** | discrete_sampling | 49.2 | 67.1 <font color=red>(-0.4)</font> | [config](./configs/rtdetrv2/rtdetrv2_r34vd_dsp_1x_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rrtdetrv2_r34vd_dsp_1x_coco.pth) |
| 55 | +**RT-DETRv2-M**<sup>*</sup>**_dsp** | discrete_sampling | 51.4 | 69.7 <font color=red>(-0.2)</font> | [config](./configs/rtdetrv2/rtdetrv2_r50vd_m_dsp_3x_coco.yml) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r50vd_m_dsp_3x_coco.pth) |
| 56 | +**RT-DETRv2-L_dsp** | discrete_sampling | 52.9 | 71.3 <font color=red>(-0.3)</font> |[config](./configs/rtdetrv2/rtdetrv2_r50vd_dsp_1x_coco.yml)| [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r50vd_dsp_1x_coco.pth) |
| 57 | + |
| 58 | + |
| 59 | +<!-- **rtdetrv2_r18vd_dsp1** | discrete_sampling | 21600 | 46.3 | 63.9 | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_dsp1_1x_coco.pth) --> |
| 60 | + |
| 61 | +<!-- rtdetrv2_r18vd_dsp1 | discrete_sampling | 21600 | 45.5 | 63.0 | 4.34 | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_dsp1_120e_coco.pth) --> |
| 62 | +<!-- 4.3 --> |
| 63 | + |
| 64 | +**Notes:** |
| 65 | +- The impact on inference speed is related to specific device and software. |
| 66 | +- `*_dsp*` is the model inherit `*_sp*` model's knowledge and adapt to `discrete_sampling` strategy. **You can use TensorRT 8.4 (or even older versions) to inference for these models** |
| 67 | +<!-- - `grid_sampling` use `grid_sample` to sample attention map, `discrete_sampling` use `index_select` method to sample attention map. --> |
| 68 | + |
| 69 | + |
| 70 | +### Ablation on sampling points |
| 71 | + |
| 72 | +<!-- Flexible samping strategy in cross attenstion layer for devices that do **not** optimize (or not support) `grid_sampling` well. You can choose models based on specific scenarios and the trade-off between speed and accuracy. --> |
| 73 | + |
| 74 | +| Model | Sampling Method | #Points | AP<sup>val</sup> | AP<sub>50</sub><sup>val</sup> | checkpoint |
| 75 | +| :---: | :---: | :---: | :---: | :---: | :---: | |
| 76 | +**rtdetrv2_r18vd_sp1** | grid_sampling | 21,600 | 47.3 | 64.3 <font color=red>(-0.6) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_sp1_120e_coco.pth) |
| 77 | +**rtdetrv2_r18vd_sp2** | grid_sampling | 43,200 | 47.7 | 64.7 <font color=red>(-0.2) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_sp2_120e_coco.pth) |
| 78 | +**rtdetrv2_r18vd_sp3** | grid_sampling | 64,800 | 47.8 | 64.8 <font color=red>(-0.1) | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_sp3_120e_coco.pth) |
| 79 | +rtdetrv2_r18vd(_sp4)| grid_sampling | 86,400 | 47.9 | 64.9 | [url](https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r18vd_120e_coco.pth) |
| 80 | + |
| 81 | +**Notes:** |
| 82 | +- The impact on inference speed is related to specific device and software. |
| 83 | +- `#points` the total number of sampling points in decoder for per image inference. |
| 84 | + |
| 85 | + |
| 86 | +## Usage |
| 87 | +<details> |
| 88 | +<summary> details </summary> |
| 89 | + |
| 90 | +<!-- <summary>1. Training </summary> --> |
| 91 | +1. Training |
| 92 | +```shell |
| 93 | +CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config --use-amp --seed=0 &> log.txt 2>&1 & |
| 94 | +``` |
| 95 | + |
| 96 | +<!-- <summary>2. Testing </summary> --> |
| 97 | +2. Testing |
| 98 | +```shell |
| 99 | +CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config -r path/to/checkpoint --test-only |
| 100 | +``` |
| 101 | + |
| 102 | +<!-- <summary>3. Tuning </summary> --> |
| 103 | +3. Tuning |
| 104 | +```shell |
| 105 | +CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config -t path/to/checkpoint --use-amp --seed=0 &> log.txt 2>&1 & |
| 106 | +``` |
| 107 | + |
| 108 | +<!-- <summary>4. Export onnx </summary> --> |
| 109 | +4. Export onnx |
| 110 | +```shell |
| 111 | +python tools/export_onnx.py -c path/to/config -r path/to/checkpoint --check |
| 112 | +``` |
| 113 | + |
| 114 | +<!-- <summary>5. Inference </summary> --> |
| 115 | +5. Inference |
| 116 | + |
| 117 | +Support torch, onnxruntime, tensorrt and openvino, see details in *references/deploy* |
| 118 | +```shell |
| 119 | +python references/deploy/rtdetrv2_onnx.py --onnx-file=model.onnx --im-file=xxxx |
| 120 | +python references/deploy/rtdetrv2_tensorrt.py --trt-file=model.trt --im-file=xxxx |
| 121 | +python references/deploy/rtdetrv2_torch.py -c path/to/config -r path/to/checkpoint --im-file=xxx --device=cuda:0 |
| 122 | +``` |
| 123 | +</details> |
| 124 | + |
| 125 | + |
| 126 | + |
| 127 | +## Citation |
| 128 | +If you use `RTDETR` or `RTDETRv2` in your work, please use the following BibTeX entries: |
| 129 | + |
| 130 | +<details> |
| 131 | +<summary> bibtex </summary> |
| 132 | + |
| 133 | +```latex |
| 134 | +@misc{lv2023detrs, |
| 135 | + title={DETRs Beat YOLOs on Real-time Object Detection}, |
| 136 | + author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu}, |
| 137 | + year={2023}, |
| 138 | + eprint={2304.08069}, |
| 139 | + archivePrefix={arXiv}, |
| 140 | + primaryClass={cs.CV} |
| 141 | +} |
| 142 | +
|
| 143 | +@misc{lv2024rtdetrv2improvedbaselinebagoffreebies, |
| 144 | + title={RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer}, |
| 145 | + author={Wenyu Lv and Yian Zhao and Qinyao Chang and Kui Huang and Guanzhong Wang and Yi Liu}, |
| 146 | + year={2024}, |
| 147 | + eprint={2407.17140}, |
| 148 | + archivePrefix={arXiv}, |
| 149 | + primaryClass={cs.CV}, |
| 150 | + url={https://arxiv.org/abs/2407.17140}, |
| 151 | +} |
| 152 | +``` |
| 153 | +</details> |
0 commit comments