This project enhances the performance of the real-time object detector RT-DETR on the TACO (Trash Annotations in Context) dataset by leveraging Knowledge Distillation. We use a powerful but slower teacher model, Conditional DETR, to guide the training of a faster student model, RT-DETR-L, significantly improving its accuracy without compromising its real-time inference speed.
The entire training and benchmarking pipeline is designed to be executed within a Kaggle Notebook environment.
- Overview
- Key Features
- Performance Benchmark
- Methodology
- Project Structure
- Usage and Reproduction
- License
- Acknowledgements
The primary goal is to improve the accuracy of the RT-DETR model for trash detection on the TACO dataset. Standard fine-tuning can be limited, so we employ a teacher-student knowledge distillation strategy.
- Student Model: RT-DETR-L, a fast and efficient real-time object detector.
- Teacher Model: Conditional DETR (
microsoft/conditional-detr-resnet-50), a larger, more accurate, but slower detector. - Core Idea: The student model learns not only from ground-truth labels but also from the rich knowledge provided by the teacher model, including intermediate feature representations and final prediction distributions.
The final distilled model is benchmarked against a standard fine-tuned RT-DETR and a powerful YOLOv11l baseline to demonstrate the effectiveness of our approach.
- Knowledge Distillation Pipeline: Implements both feature-level and prediction-level distillation.
- Automated Training Orchestration: A master script (
train.py) manages the entire workflow, from data preparation to distillation and final fine-tuning. - Environment-Aware Configuration: A central
config.pyautomatically detects local vs. Kaggle environments. - Multi-GPU Support: Leverages
torchrunfor efficient, distributed training. - Comprehensive Benchmarking: A dedicated script within our Kaggle notebook compares models across accuracy (mAP), complexity (Parameters, FLOPs), and inference speed.
All models were evaluated on the TACO validation set. The benchmark was conducted on a Tesla T4 GPU. The full analysis, including the generation of these results, can be found in the analysis notebook mentioned in the "How to Reproduce" section.
| Model | mAP@.50-.95 | mAP@.50 | Speed (ms) | Params (M) | FLOPs (G) |
|---|---|---|---|---|---|
| RT-DETR (Distilled) | 0.2610 | 0.3100 | 59.06 | 40.92 | 136.06 |
| RT-DETR (Baseline) | 0.2390 | 0.3000 | 59.86 | 40.92 | 136.06 |
| YOLOv11l (Baseline) | 0.2566 | 0.2960 | 28.74 | 25.36 | 87.53 |
-
Knowledge Distillation is Highly Effective: The Distilled RT-DETR significantly outperforms the Baseline RT-DETR, achieving a 9.2% relative improvement in mAP@.50-.95. This accuracy gain comes at no extra cost to inference speed or model complexity.
-
Comparison with a Strong Baseline (YOLO): While the YOLOv11l is significantly faster and more efficient, the Distilled RT-DETR achieves higher accuracy, demonstrating its superior capability in learning complex representations after being guided by a powerful teacher.
Conclusion: Knowledge distillation is a powerful technique to enhance a real-time detector like RT-DETR, allowing it to achieve state-of-the-art accuracy.
The project follows a three-stage pipeline: Data Preparation, Knowledge Distillation, and Comparative Fine-tuning.

The codebase is organized into a modular and reusable structure.
.
βββ config.py # Central configuration for all paths and settings
βββ train.py # Master script to orchestrate the entire training pipeline
βββ requirements.txt # Project dependencies
βββ benchmark/
β βββ aftertrain-analysis-rt-codetr.ipynb # Kaggle notebook for final analysis
βββ scripts/ # Helper scripts for data prep and config generation
βββ src/
β βββ distillation/ # Core logic for knowledge distillation
β βββ finetune/ # Scripts for fine-tuning baselines
βββ rtdetr/ # Submodule with the RT-DETR source code
βββ templates/ # Template files for experiment configs
βββ output/ # Default directory for all generated outputs
There are two ways to engage with this project:
- Option 1 (Recommended): Reproduce the final benchmark results quickly using our prepared Kaggle Notebook.
- Option 2 (Advanced): Run the entire training pipeline from scratch on your own machine.
This is the easiest way to verify our findings. The entire analysis is encapsulated in a single, self-contained Kaggle Notebook.
The notebook containing the complete benchmark code is located in the repository at:
./benchmark/aftertrain-analysis-rt-codetr.ipynb
This notebook is the single source of truth for reproducing the results. All generated benchmark outputs (.csv, .png) can also be found in this folder.
When you open the notebook, attach the following Kaggle datasets as input:
- TACO Dataset:
/kaggle/input/dsp-pre-final - Pre-trained Models:
/kaggle/input/rt-co-detr-trained
Execute all cells in the notebook sequentially. The notebook is designed to be fully automated:
- It installs all necessary dependencies.
- It clones the required
RT-CO-DETRrepository. - It runs the
final_benchmark.pyscript, which performs all evaluation tasks. - Finally, it generates and displays the summary table and comparison plot, which are saved to
/kaggle/working/benchmark_output/.
Follow these steps if you want to run the entire training process from scratch on your own machine.
- Python 3.11+
- PyTorch 2.0+ and
torchvision - Numpy < 2.0
- Git
- An NVIDIA GPU with CUDA for training
- All other dependencies are listed in
requirements.txt.
# Clone this repository
git clone https://github.com/nam-htran/RT-CO-DETR.git
cd RT-CO-DETR
# Install required packages
pip install -r requirements.txtThe train.py script will automatically clone the required RT-DETR submodule if it is not found.
Place your processed_taco_coco dataset according to the structure defined in config.py. For a local setup, the expected structure is:
.
βββ data_input/
β βββ processed_taco_coco/
β βββ train2017/
β βββ val2017/
β βββ annotations/
βββ ... (rest of the project files)
The train.py script orchestrates the entire process.
Run the Full Pipeline (Recommended): This command executes data preparation, knowledge distillation, and all fine-tuning experiments sequentially.
python train.py --allRun Steps Individually: This is useful for debugging or re-running specific parts of the pipeline.
# 1. Prepare data formats (creates YOLO data)
python train.py --prepare-data
# 2. Run knowledge distillation (GPU-intensive)
python train.py --distill
# 3. Run fine-tuning experiments for all three models
python train.py --finetuneThis project is licensed under the MIT License. See the LICENSE file for details.
- This project is built upon the official implementation of RT-DETR.
- The teacher model, Conditional-DETR, is provided by Microsoft via the Hugging Face Hub.
- The baseline model, YOLO, is provided by Ultralytics.
- The TACO dataset is the foundation for this trash detection task.