Skip to content

Commit 1504b2a

Browse files
authored
Add rtdetr support (#44)
* Add rt-detr v1 and v2 support * Add docs for rt-detr v1 and v2 * Implement feedback
1 parent b425f75 commit 1504b2a

File tree

12 files changed

+380
-3
lines changed

12 files changed

+380
-3
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ and time-consuming. Labelformat aims to solve this pain.
2424
- [Labelbox](https://labelformat.com/formats/object-detection/labelbox/) (input only)
2525
- [Lightly](https://labelformat.com/formats/object-detection/lightly/)
2626
- [PascalVOC](https://labelformat.com/formats/object-detection/pascalvoc/)
27+
- [RT-DETR](https://labelformat.com/formats/object-detection/rtdetr/)
28+
- [RT-DETRv2](https://labelformat.com/formats/object-detection/rtdetrv2/)
2729
- [YOLOv5](https://labelformat.com/formats/object-detection/yolov5/)
2830
- [YOLOv6](https://labelformat.com/formats/object-detection/yolov6/)
2931
- [YOLOv7](https://labelformat.com/formats/object-detection/yolov7/)

docs/features.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ Labelformat offers a robust set of features tailored to meet the diverse needs o
3636
- **[Labelbox](formats/object-detection/labelbox.md)** (input only)
3737
- **[Lightly](formats/object-detection/lightly.md)**
3838
- **[PascalVOC](formats/object-detection/pascalvoc.md)**
39+
- **[RT-DETR](formats/object-detection/rtdetr.md)**
40+
- **[RT-DETRv2](formats/object-detection/rtdetrv2.md)**
3941
- **[YOLOv5](formats/object-detection/yolov5.md)**
4042
- **[YOLOv6](formats/object-detection/yolov6.md)**
4143
- **[YOLOv7](formats/object-detection/yolov7.md)**

docs/formats/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66
- [Labelbox](./object-detection/labelbox.md)
77
- [Lightly](./object-detection/lightly.md)
88
- [PascalVOC](./object-detection/pascalvoc.md)
9+
- [RT-DETR](./object-detection/rtdetr.md)
10+
- [RT-DETRv2](./object-detection/rtdetrv2.md)
911
- [YOLOv5](./object-detection/yolov5.md)
1012
- [YOLOv6](./object-detection/yolov6.md)
1113
- [YOLOv7](./object-detection/yolov7.md)

docs/formats/object-detection/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ Labelformat supports converting between major object detection annotation format
1515
- [Labelbox](./labelbox.md)
1616
- [Lightly](./lightly.md)
1717
- [PascalVOC](./pascalvoc.md)
18+
- [RT-DETR](./rtdetr.md)
19+
- [RT-DETRv2](./rtdetrv2.md)
1820
- [YOLOv5](./yolov5.md)
1921
- [YOLOv6](./yolov6.md)
2022
- [YOLOv7](./yolov7.md)
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# RT-DETR Object Detection Format
2+
3+
## Overview
4+
5+
**RT-DETR (Real-Time DEtection TRansformer)** is a groundbreaking end-to-end object detection framework introduced in the paper [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069). RT-DETR represents the first real-time end-to-end object detector that successfully challenges the dominance of YOLO detectors in real-time applications. Unlike traditional detectors that require Non-Maximum Suppression (NMS) post-processing, RT-DETR eliminates NMS entirely while achieving superior speed and accuracy performance.
6+
7+
> **Info:** RT-DETR was introduced through the academic paper "DETRs Beat YOLOs on Real-time Object Detection" published in 2023.
8+
For the full paper, see: [arXiv:2304.08069](https://arxiv.org/abs/2304.08069)
9+
For implementation details and code, see: [GitHub Repository: lyuwenyu/RT-DETR](https://github.com/lyuwenyu/RT-DETR)
10+
11+
> **Availability:** RT-DETR is now available in multiple frameworks:
12+
- [Hugging Face Transformers](https://huggingface.co/docs/transformers/model_doc/rt_detr)
13+
- [Ultralytics](https://docs.ultralytics.com/models/rtdetr/)
14+
15+
## Key RT-DETR Model Features
16+
17+
RT-DETR uses the standard **COCO annotation format** while introducing revolutionary architectural innovations for real-time detection:
18+
19+
- **End-to-End Architecture:** First real-time detector to completely eliminate NMS post-processing, providing more stable and predictable inference times.
20+
- **Efficient Hybrid Encoder:** Novel encoder design that decouples intra-scale interaction and cross-scale fusion to significantly reduce computational overhead.
21+
- **Uncertainty-Minimal Query Selection:** Advanced query initialization scheme that optimizes both classification and localization confidence for improved detection quality.
22+
- **Flexible Speed Tuning:** Supports adjustable inference speed by modifying the number of decoder layers without retraining.
23+
- **Superior Performance:** Achieves state-of-the-art results (e.g., RT-DETR-R50 reaches 53.1% mAP @ 108 FPS on T4 GPU, outperforming YOLOv8-L in both speed and accuracy).
24+
- **Multiple Model Scales:** Available in various scales (R18, R34, R50, R101) to accommodate different computational requirements.
25+
26+
These architectural innovations are handled internally by the model design and training pipeline, requiring no changes to the standard COCO annotation format described below.
27+
28+
## Specification of RT-DETR Detection Format
29+
30+
RT-DETR uses the standard **COCO format** for annotations, ensuring seamless integration with existing COCO datasets and tools. The format consists of a single JSON file containing three main components:
31+
32+
### `images`
33+
Defines metadata for each image in the dataset:
34+
```json
35+
{
36+
"id": 0, // Unique image ID
37+
"file_name": "image1.jpg", // Image filename
38+
"width": 640, // Image width in pixels
39+
"height": 416 // Image height in pixels
40+
}
41+
```
42+
43+
### `categories`
44+
Defines the object classes:
45+
```json
46+
{
47+
"id": 0, // Unique category ID
48+
"name": "cat" // Category name
49+
}
50+
```
51+
52+
### `annotations`
53+
Defines object instances:
54+
```json
55+
{
56+
"image_id": 0, // Reference to image
57+
"category_id": 2, // Reference to category
58+
"bbox": [540.0, 295.0, 23.0, 18.0] // [x, y, width, height] in absolute pixels
59+
}
60+
```
61+
62+
## Directory Structure of RT-DETR Dataset
63+
64+
```
65+
dataset/
66+
├── images/ # Image files
67+
│ ├── image1.jpg
68+
│ └── image2.jpg
69+
└── annotations.json # Single JSON file containing all annotations
70+
```
71+
72+
## Benefits of RT-DETR Format
73+
74+
- **Standard Compatibility:** Uses the widely-adopted COCO format, ensuring compatibility with existing tools and frameworks.
75+
- **Flexibility:** Supports adjustable inference speeds without retraining, making it adaptable to various real-time scenarios.
76+
- **Superior Accuracy:** Achieves better accuracy than comparable YOLO detectors while maintaining competitive speed.
77+
78+
## Converting Annotations to RT-DETR Format with Labelformat
79+
80+
Since RT-DETR uses the standard COCO format, converting annotations to RT-DETR format is equivalent to converting to COCO format.
81+
82+
### Installation
83+
84+
First, ensure that Labelformat is installed:
85+
86+
```shell
87+
pip install labelformat
88+
```
89+
90+
### Conversion Example: YOLOv8 to RT-DETR
91+
92+
Assume you have annotations in YOLOv8 format and wish to convert them to RT-DETR. Here's how you can achieve this using Labelformat.
93+
94+
**Step 1: Prepare Your Dataset**
95+
96+
Ensure your dataset follows the standard YOLOv8 structure with `data.yaml` and label files.
97+
98+
**Step 2: Run the Conversion Command**
99+
100+
Use the Labelformat CLI to convert YOLOv8 annotations to RT-DETR (COCO format):
101+
```bash
102+
labelformat convert \
103+
--task object-detection \
104+
--input-format yolov8 \
105+
--input-file dataset/data.yaml \
106+
--input-split train \
107+
--output-format rtdetr \
108+
--output-file dataset/rtdetr_annotations.json
109+
```
110+
111+
**Step 3: Verify the Converted Annotations**
112+
113+
After conversion, your dataset structure will be:
114+
```
115+
dataset/
116+
├── images/
117+
│ ├── image1.jpg
118+
│ ├── image2.jpg
119+
│ └── ...
120+
└── rtdetr_annotations.json # COCO format annotations for RT-DETR
121+
```
122+
123+
### Python API Example
124+
125+
```python
126+
from pathlib import Path
127+
from labelformat.formats import YOLOv8ObjectDetectionInput, RTDETRObjectDetectionOutput
128+
129+
# Load YOLOv8 format
130+
label_input = YOLOv8ObjectDetectionInput(
131+
input_file=Path("dataset/data.yaml"),
132+
input_split="train"
133+
)
134+
135+
# Convert to RT-DETR format
136+
RTDETRObjectDetectionOutput(
137+
output_file=Path("dataset/rtdetr_annotations.json")
138+
).save(label_input=label_input)
139+
```
140+
141+
## Error Handling in Labelformat
142+
143+
Since RT-DETR uses the COCO format, the same validation and error handling applies:
144+
145+
- **Invalid JSON Structure:** Proper error reporting for malformed JSON files
146+
- **Missing Required Fields:** Validation ensures all required COCO fields are present
147+
- **Reference Integrity:** Checks that image_id and category_id references are valid
148+
- **Bounding Box Validation:** Ensures bounding boxes are within image boundaries
149+
150+
Example of a properly formatted annotation:
151+
```json
152+
{
153+
"images": [{"id": 0, "file_name": "image1.jpg", "width": 640, "height": 480}],
154+
"categories": [{"id": 1, "name": "person"}],
155+
"annotations": [{"image_id": 0, "category_id": 1, "bbox": [100, 120, 50, 80]}]
156+
}
157+
```
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# RT-DETRv2 Object Detection Format
2+
3+
## Overview
4+
5+
**RT-DETRv2** is an enhanced version of the Real-Time DEtection TRansformer ([RT-DETR](https://arxiv.org/abs/2304.08069)), introduced in the paper [RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer](https://arxiv.org/abs/2407.17140). Building upon the groundbreaking end-to-end object detection framework of the original RT-DETR, RT-DETRv2 continues the legacy of eliminating Non-Maximum Suppression (NMS) post-processing while introducing additional improvements in accuracy and efficiency for real-time object detection scenarios.
6+
7+
> **Info:** RT-DETRv2 was introduced through the technical report "RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer" published in 2024.
8+
For the full paper, see: [arXiv:2407.17140](https://arxiv.org/abs/2407.17140)
9+
For RT-DETR foundation, see: [RT-DETR Paper (arXiv:2304.08069)](https://arxiv.org/abs/2304.08069)
10+
For implementation details and code, see: [GitHub Repository: lyuwenyu/RT-DETR](https://github.com/lyuwenyu/RT-DETR)
11+
12+
> **Availability:** RT-DETRv2 is now available in multiple frameworks:
13+
- [Hugging Face Transformers](https://huggingface.co/docs/transformers/model_doc/rt_detr_v2)
14+
- [Ultralytics](https://docs.ultralytics.com/models/rtdetr/)
15+
16+
## Key RT-DETRv2 Model Features
17+
18+
RT-DETRv2 maintains compatibility with the standard **COCO annotation format** while introducing specific technical improvements over RT-DETR:
19+
20+
- **Distinct Sampling Points for Different Scales:** Introduces flexible multi-scale feature extraction by setting different numbers of sampling points for features at different scales in the deformable attention module, rather than using the same number across all scales.
21+
- **Discrete Sampling Operator:** Provides an optional discrete sampling operator to replace the grid_sample operator, removing deployment constraints typically associated with DETRs and improving practical applicability across different deployment platforms.
22+
- **Dynamic Data Augmentation:** Implements adaptive data augmentation strategy that applies stronger augmentation in early training periods and reduces it in later stages to improve model robustness and target domain adaptation.
23+
- **Scale-Adaptive Hyperparameters:** Customizes optimizer hyperparameters based on model scale, using higher learning rates for lighter models (e.g., ResNet18) and lower rates for larger models (e.g., ResNet101) to achieve optimal performance.
24+
- **Bag-of-Freebies Approach:** Incorporates multiple training improvements that enhance performance without increasing inference cost or model complexity.
25+
- **Consistent Performance Gains:** Achieves improved accuracy across all model scales (S: +1.4 mAP, M: +1.0 mAP, L: +0.3 mAP) while maintaining the same inference speed as RT-DETR.
26+
27+
These enhancements are handled internally by the model design and training pipeline, requiring no changes to the standard COCO annotation format described below.
28+
29+
## Specification of RT-DETRv2 Detection Format
30+
31+
RT-DETRv2 uses the standard **COCO format** for annotations, ensuring complete compatibility with existing COCO datasets and tools. The format specification is identical to the original COCO format:
32+
33+
### `images`
34+
Defines metadata for each image in the dataset:
35+
```json
36+
{
37+
"id": 0, // Unique image ID
38+
"file_name": "image1.jpg", // Image filename
39+
"width": 640, // Image width in pixels
40+
"height": 416 // Image height in pixels
41+
}
42+
```
43+
44+
### `categories`
45+
Defines the object classes:
46+
```json
47+
{
48+
"id": 0, // Unique category ID
49+
"name": "cat" // Category name
50+
}
51+
```
52+
53+
### Annotations
54+
Defines object instances:
55+
```json
56+
{
57+
"image_id": 0, // Reference to image
58+
"category_id": 2, // Reference to category
59+
"bbox": [540.0, 295.0, 23.0, 18.0] // [x, y, width, height] in absolute pixels
60+
}
61+
```
62+
63+
## Directory Structure of RT-DETRv2 Dataset
64+
65+
```
66+
dataset/
67+
├── images/ # Image files
68+
│ ├── image1.jpg
69+
│ └── image2.jpg
70+
└── annotations.json # Single JSON file containing all annotations
71+
```
72+
73+
## Benefits of RT-DETRv2 Format
74+
75+
- **Standard Compatibility:** Uses the widely-adopted COCO format, ensuring compatibility with existing tools and frameworks.
76+
- **End-to-End Processing:** Maintains the NMS-free architecture for stable and predictable inference performance.
77+
- **Enhanced Performance:** Improved accuracy and efficiency compared to the original RT-DETR.
78+
79+
## Converting Annotations to RT-DETRv2 Format with Labelformat
80+
81+
Since RT-DETRv2 uses the standard COCO format, converting annotations to RT-DETRv2 format is equivalent to converting to COCO format.
82+
83+
### Installation
84+
85+
First, ensure that Labelformat is installed:
86+
87+
```shell
88+
pip install labelformat
89+
```
90+
91+
### Conversion Example: YOLOv8 to RT-DETRv2
92+
93+
**Step 1: Prepare Your Dataset**
94+
95+
Ensure your dataset follows the standard YOLOv8 structure with `data.yaml` and label files.
96+
97+
**Step 2: Run the Conversion Command**
98+
99+
Use the Labelformat CLI to convert YOLOv8 annotations to RT-DETRv2 (COCO format):
100+
```bash
101+
labelformat convert \
102+
--task object-detection \
103+
--input-format yolov8 \
104+
--input-file dataset/data.yaml \
105+
--input-split train \
106+
--output-format rtdetrv2 \
107+
--output-file dataset/rtdetrv2_annotations.json
108+
```
109+
110+
**Step 3: Verify the Converted Annotations**
111+
112+
After conversion, your dataset structure will be:
113+
```
114+
dataset/
115+
├── images/
116+
│ ├── image1.jpg
117+
│ ├── image2.jpg
118+
│ └── ...
119+
└── rtdetrv2_annotations.json # COCO format annotations for RT-DETRv2
120+
```
121+
122+
### Python API Example
123+
124+
```python
125+
from pathlib import Path
126+
from labelformat.formats import YOLOv8ObjectDetectionInput, RTDETRv2ObjectDetectionOutput
127+
128+
# Load YOLOv8 format
129+
label_input = YOLOv8ObjectDetectionInput(
130+
input_file=Path("dataset/data.yaml"),
131+
input_split="train"
132+
)
133+
134+
# Convert to RT-DETRv2 format
135+
RTDETRv2ObjectDetectionOutput(
136+
output_file=Path("dataset/rtdetrv2_annotations.json")
137+
).save(label_input=label_input)
138+
```
139+
140+
## RT-DETRv2 vs RT-DETR
141+
142+
RT-DETRv2 builds upon the foundation of RT-DETR with several key improvements:
143+
144+
- **Enhanced Architecture:** Refined encoder and decoder designs for better performance
145+
- **Improved Training:** Advanced training strategies and optimization techniques
146+
- **Better Accuracy:** Higher detection accuracy across various model scales
147+
148+
## Error Handling in Labelformat
149+
150+
Since RT-DETRv2 uses the COCO format, the same validation and error handling applies:
151+
152+
- **Invalid JSON Structure:** Proper error reporting for malformed JSON files
153+
- **Missing Required Fields:** Validation ensures all required COCO fields are present
154+
- **Invalid JSON Structure:** Proper error reporting for malformed JSON files.
155+
- **Missing Required Fields:** Validation ensures all required COCO fields are present.
156+
- **Reference Integrity:** Checks that image_id and category_id references are valid.
157+
- **Bounding Box Validation:** Ensures bounding boxes are within image boundaries.
158+
```json
159+
{
160+
"images": [{"id": 0, "file_name": "image1.jpg", "width": 640, "height": 480}],
161+
"categories": [{"id": 1, "name": "person"}],
162+
"annotations": [{"image_id": 0, "category_id": 1, "bbox": [100, 120, 50, 80]}]
163+
}
164+
```

docs/formats/object-detection/yolov12.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,8 @@ The **YOLOv12 detection format** remains consistent with previous versions (v5-v
4040

4141
- **Object Representation:**
4242
Each line in the text file represents a single object detected within the image, following the format: `<class_id> <x_center> <y_center> <width> <height>`
43-
- **`<class_id>` (Integer):** An integer representing the object's class.
44-
- **`<x_center>` and `<y_center>` (Float):** The normalized coordinates of the object's center relative to the image's width and height.
43+
- **`<class_id>` (Integer):** An integer representing the object's class.
44+
- **`<x_center>` and `<y_center>` (Float):** The normalized coordinates of the object's center relative to the image's width and height.
4545
- **`<width>` and `<height>` (Float):** The normalized width and height of the bounding box encompassing the object.
4646

4747
- **Normalization of Values:**

mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,8 @@ nav:
3535
- Labelbox Format: formats/object-detection/labelbox.md
3636
- Lightly Format: formats/object-detection/lightly.md
3737
- PascalVOC Format: formats/object-detection/pascalvoc.md
38+
- RT-DETR Format: formats/object-detection/rtdetr.md
39+
- RT-DETRv2 Format: formats/object-detection/rtdetrv2.md
3840
- YOLOv5 Format: formats/object-detection/yolov5.md
3941
- YOLOv6 Format: formats/object-detection/yolov6.md
4042
- YOLOv7 Format: formats/object-detection/yolov7.md

src/labelformat/cli/cli.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ def main() -> None:
2222
2323
Supported label formats for object detection:
2424
- YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, YOLOv11, YOLOv12, YOLOv26
25-
- COCO
25+
- COCO, RT-DETR, RT-DETRv2
2626
- VOC
2727
- Labelbox
2828
- and many more

0 commit comments

Comments
 (0)