|
| 1 | +# RT-DETRv2 Object Detection Format |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +**RT-DETRv2** is an enhanced version of the Real-Time DEtection TRansformer ([RT-DETR](https://arxiv.org/abs/2304.08069)), introduced in the paper [RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer](https://arxiv.org/abs/2407.17140). Building upon the groundbreaking end-to-end object detection framework of the original RT-DETR, RT-DETRv2 continues the legacy of eliminating Non-Maximum Suppression (NMS) post-processing while introducing additional improvements in accuracy and efficiency for real-time object detection scenarios. |
| 6 | + |
| 7 | +> **Info:** RT-DETRv2 was introduced through the technical report "RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer" published in 2024. |
| 8 | + For the full paper, see: [arXiv:2407.17140](https://arxiv.org/abs/2407.17140) |
| 9 | + For RT-DETR foundation, see: [RT-DETR Paper (arXiv:2304.08069)](https://arxiv.org/abs/2304.08069) |
| 10 | + For implementation details and code, see: [GitHub Repository: lyuwenyu/RT-DETR](https://github.com/lyuwenyu/RT-DETR) |
| 11 | + |
| 12 | +> **Availability:** RT-DETRv2 is now available in multiple frameworks: |
| 13 | + - [Hugging Face Transformers](https://huggingface.co/docs/transformers/model_doc/rt_detr_v2) |
| 14 | + - [Ultralytics](https://docs.ultralytics.com/models/rtdetr/) |
| 15 | + |
| 16 | +## Key RT-DETRv2 Model Features |
| 17 | + |
| 18 | +RT-DETRv2 maintains compatibility with the standard **COCO annotation format** while introducing specific technical improvements over RT-DETR: |
| 19 | + |
| 20 | +- **Distinct Sampling Points for Different Scales:** Introduces flexible multi-scale feature extraction by setting different numbers of sampling points for features at different scales in the deformable attention module, rather than using the same number across all scales. |
| 21 | +- **Discrete Sampling Operator:** Provides an optional discrete sampling operator to replace the grid_sample operator, removing deployment constraints typically associated with DETRs and improving practical applicability across different deployment platforms. |
| 22 | +- **Dynamic Data Augmentation:** Implements adaptive data augmentation strategy that applies stronger augmentation in early training periods and reduces it in later stages to improve model robustness and target domain adaptation. |
| 23 | +- **Scale-Adaptive Hyperparameters:** Customizes optimizer hyperparameters based on model scale, using higher learning rates for lighter models (e.g., ResNet18) and lower rates for larger models (e.g., ResNet101) to achieve optimal performance. |
| 24 | +- **Bag-of-Freebies Approach:** Incorporates multiple training improvements that enhance performance without increasing inference cost or model complexity. |
| 25 | +- **Consistent Performance Gains:** Achieves improved accuracy across all model scales (S: +1.4 mAP, M: +1.0 mAP, L: +0.3 mAP) while maintaining the same inference speed as RT-DETR. |
| 26 | + |
| 27 | +These enhancements are handled internally by the model design and training pipeline, requiring no changes to the standard COCO annotation format described below. |
| 28 | + |
| 29 | +## Specification of RT-DETRv2 Detection Format |
| 30 | + |
| 31 | +RT-DETRv2 uses the standard **COCO format** for annotations, ensuring complete compatibility with existing COCO datasets and tools. The format specification is identical to the original COCO format: |
| 32 | + |
| 33 | +### `images` |
| 34 | +Defines metadata for each image in the dataset: |
| 35 | +```json |
| 36 | +{ |
| 37 | + "id": 0, // Unique image ID |
| 38 | + "file_name": "image1.jpg", // Image filename |
| 39 | + "width": 640, // Image width in pixels |
| 40 | + "height": 416 // Image height in pixels |
| 41 | +} |
| 42 | +``` |
| 43 | + |
| 44 | +### `categories` |
| 45 | +Defines the object classes: |
| 46 | +```json |
| 47 | +{ |
| 48 | + "id": 0, // Unique category ID |
| 49 | + "name": "cat" // Category name |
| 50 | +} |
| 51 | +``` |
| 52 | + |
| 53 | +### Annotations |
| 54 | +Defines object instances: |
| 55 | +```json |
| 56 | +{ |
| 57 | + "image_id": 0, // Reference to image |
| 58 | + "category_id": 2, // Reference to category |
| 59 | + "bbox": [540.0, 295.0, 23.0, 18.0] // [x, y, width, height] in absolute pixels |
| 60 | +} |
| 61 | +``` |
| 62 | + |
| 63 | +## Directory Structure of RT-DETRv2 Dataset |
| 64 | + |
| 65 | +``` |
| 66 | +dataset/ |
| 67 | +├── images/ # Image files |
| 68 | +│ ├── image1.jpg |
| 69 | +│ └── image2.jpg |
| 70 | +└── annotations.json # Single JSON file containing all annotations |
| 71 | +``` |
| 72 | + |
| 73 | +## Benefits of RT-DETRv2 Format |
| 74 | + |
| 75 | +- **Standard Compatibility:** Uses the widely-adopted COCO format, ensuring compatibility with existing tools and frameworks. |
| 76 | +- **End-to-End Processing:** Maintains the NMS-free architecture for stable and predictable inference performance. |
| 77 | +- **Enhanced Performance:** Improved accuracy and efficiency compared to the original RT-DETR. |
| 78 | + |
| 79 | +## Converting Annotations to RT-DETRv2 Format with Labelformat |
| 80 | + |
| 81 | +Since RT-DETRv2 uses the standard COCO format, converting annotations to RT-DETRv2 format is equivalent to converting to COCO format. |
| 82 | + |
| 83 | +### Installation |
| 84 | + |
| 85 | +First, ensure that Labelformat is installed: |
| 86 | + |
| 87 | +```shell |
| 88 | +pip install labelformat |
| 89 | +``` |
| 90 | + |
| 91 | +### Conversion Example: YOLOv8 to RT-DETRv2 |
| 92 | + |
| 93 | +**Step 1: Prepare Your Dataset** |
| 94 | + |
| 95 | +Ensure your dataset follows the standard YOLOv8 structure with `data.yaml` and label files. |
| 96 | + |
| 97 | +**Step 2: Run the Conversion Command** |
| 98 | + |
| 99 | +Use the Labelformat CLI to convert YOLOv8 annotations to RT-DETRv2 (COCO format): |
| 100 | +```bash |
| 101 | +labelformat convert \ |
| 102 | + --task object-detection \ |
| 103 | + --input-format yolov8 \ |
| 104 | + --input-file dataset/data.yaml \ |
| 105 | + --input-split train \ |
| 106 | + --output-format rtdetrv2 \ |
| 107 | + --output-file dataset/rtdetrv2_annotations.json |
| 108 | +``` |
| 109 | + |
| 110 | +**Step 3: Verify the Converted Annotations** |
| 111 | + |
| 112 | +After conversion, your dataset structure will be: |
| 113 | +``` |
| 114 | +dataset/ |
| 115 | +├── images/ |
| 116 | +│ ├── image1.jpg |
| 117 | +│ ├── image2.jpg |
| 118 | +│ └── ... |
| 119 | +└── rtdetrv2_annotations.json # COCO format annotations for RT-DETRv2 |
| 120 | +``` |
| 121 | + |
| 122 | +### Python API Example |
| 123 | + |
| 124 | +```python |
| 125 | +from pathlib import Path |
| 126 | +from labelformat.formats import YOLOv8ObjectDetectionInput, RTDETRv2ObjectDetectionOutput |
| 127 | + |
| 128 | +# Load YOLOv8 format |
| 129 | +label_input = YOLOv8ObjectDetectionInput( |
| 130 | + input_file=Path("dataset/data.yaml"), |
| 131 | + input_split="train" |
| 132 | +) |
| 133 | + |
| 134 | +# Convert to RT-DETRv2 format |
| 135 | +RTDETRv2ObjectDetectionOutput( |
| 136 | + output_file=Path("dataset/rtdetrv2_annotations.json") |
| 137 | +).save(label_input=label_input) |
| 138 | +``` |
| 139 | + |
| 140 | +## RT-DETRv2 vs RT-DETR |
| 141 | + |
| 142 | +RT-DETRv2 builds upon the foundation of RT-DETR with several key improvements: |
| 143 | + |
| 144 | +- **Enhanced Architecture:** Refined encoder and decoder designs for better performance |
| 145 | +- **Improved Training:** Advanced training strategies and optimization techniques |
| 146 | +- **Better Accuracy:** Higher detection accuracy across various model scales |
| 147 | + |
| 148 | +## Error Handling in Labelformat |
| 149 | + |
| 150 | +Since RT-DETRv2 uses the COCO format, the same validation and error handling applies: |
| 151 | + |
| 152 | +- **Invalid JSON Structure:** Proper error reporting for malformed JSON files |
| 153 | +- **Missing Required Fields:** Validation ensures all required COCO fields are present |
| 154 | +- **Invalid JSON Structure:** Proper error reporting for malformed JSON files. |
| 155 | +- **Missing Required Fields:** Validation ensures all required COCO fields are present. |
| 156 | +- **Reference Integrity:** Checks that image_id and category_id references are valid. |
| 157 | +- **Bounding Box Validation:** Ensures bounding boxes are within image boundaries. |
| 158 | +```json |
| 159 | +{ |
| 160 | + "images": [{"id": 0, "file_name": "image1.jpg", "width": 640, "height": 480}], |
| 161 | + "categories": [{"id": 1, "name": "person"}], |
| 162 | + "annotations": [{"image_id": 0, "category_id": 1, "bbox": [100, 120, 50, 80]}] |
| 163 | +} |
| 164 | +``` |
0 commit comments