This repository contains the code for the YOLO detectors and the Multi-object tracker on two datasets VisDrone2019 and UAVDT.
- Yolov8
- Yolov9
- Yolov10
- Yolo11
- ByteTrack
- BoT-SORT/BoT-SORT-ReID
- SORT
- DeepSORT
- OC-SORT
| Algorithm | Association time ↑ (ms) |
|---|---|
| SORT | 4.5 |
| DeepSORT(*) | 24.4(**) |
| OC-SORT | 6.5 |
| ByteTrack | 3.5 |
| BoT-SORT | 51.0 |
| BoT-SORT-ReID(*) | 68.6(**) |
(*) : ReID model - OSNET_x0_25 with input size 128x128 and output dimension feature 128
(**): Run on on Tesla T4 with TensorRT FP16
- Detectors from ultralytics
- Evaluation Tracking TrackEval
- Track algorithms: SORT, DeepSORT. OC-SORT, ByteTrack, BoT-SORT.
Track results in VisDrone-MOT-val with detector YOLO11x, run on Tesla T4 with TensorRT FP16
| Algorithm | FPS ↑ | HOTA ↑ | MOTA ↑ | IDF1 ↑ | GPU (GB) |
|---|---|---|---|---|---|
| SORT | 42 | 51.67 | 48.09 | 62.05 | 0.605 |
| OC-SORT | 46 | 51.78 | 48.67 | 61.77 | 0.605 |
| ByteTrack | 56 | 51.63 | 44.92 | 62.87 | 0.605 |
| BoT-SORT | 13 | 57.25 | 47.16 | 70.78 | 0.605 |
| BoT-SORT-ReID | 11 | 57.28 | 47.13 | 70.92 | 0.739 |
| DeepSORT | 22 | 49.17 | 48.55 | 58.60 | 0.739 |
git clone https://github.com/haminhtien99/YoloSeries-Tracking
cd YoloSeries-Tracking
pip install -r requirements.txt
Organize your dataset in the following format:
dataset_det/
│ |---train_set/
| | |---images/ *.jpg ...
| | |---labels/ *.txt ... (label_name is the same as image_name)
| |---val_set/
| |---test_set/ (optional)
Label format (YOLO):
<object-class> <x_center> <y_center> <width> <height> (in range [0, 1])
Example .yaml config:
train: path/to/train_set
val: path/to/val_set
test: path/to/test_set
nc: 1 # number of classes
names: ['class_1', 'class_2', ...]
Organize dataset in the following format:
dataset_mot/
│ ├── train_set/
│ │ ├── video_name/
│ │ │ ├── img1/ # *.jpg frames
│ │ │ ├── gt_mot_challenge/
│ │ │ │ └── gt.txt
│ │ │ └── seqinfo.ini
gt.txt format:
| Field | Description |
|---|---|
<frame> |
The frame index of the video frame |
<id> |
The identity of the target |
<x> |
The x coordinate of the top-left corner of the predicted bounding box |
<y> |
The y coordinate of the top-left corner of the predicted object bounding box |
<w> |
The width in pixels of the predicted object bounding box |
<h> |
The height in pixels of the predicted object bounding box |
<conf> |
The confidence of the predicted bounding box, set to 1 |
<class_id> |
The class ID of the predicted object (not described above, inferred field) |
<x3D> |
The x coordinate in 3D space, set to -1 |
<y3D> |
The y coordinate in 3D space, set to -1 |
<z3D> |
The z coordinate in 3D space, set to -1 |
file seqinfo.ini looks as follows:
[Sequence]
name = M0101
imDir = img1
frameRate = 30
seqLength = ...
imWidth = ...
imHeight = ...
imext = .jpg
seqLength is the number of frames in the img1 of that sequence folder
Customized for transport-focused detection/tracking:
| Dataset | Description |
|---|---|
| UAVDT-2024-DET | Added stationary vehicles and "van" class |
| UAVDT-2024-MOT | MOT format with additional annotations |
| VisDrone2019-DET | Filtered to 4 classes: car, truck, bus, van |
| VisDrone2019-MOT | Formatted for tracking |
Try knowledge distillation training method for yolo11n, hope it works
Example training config (yolo11n.yaml):
# Configuration for training yolo model
model: detector/pretrained_weights/yolo11n.pt
data: cfg/datasets/visdrone.yaml # file in cfg/datasets folder or absolute path to dataset configuration
epochs: 3 # number of epochs
batch: 32
imgsz: 320
device: cpu
resume: false
project: detector/train_results
name: yolo11n/exp
patience: 50 # epochs to wait for no observable improvement for early stopping of training
# distillation knowledge training
# path to teacher model if train distillation , otherwise set to null
# teacher: null
teacher:
path: your/path/to/trained/teacher/best.pt
temperature: 10.
lambda_factor: 0.5python -m detector.train --cfg yolo11n.yamlpython -m detector.val \
--detectors-path path/to/detectors \
--sub-path dataset_name \
--model-name yolov8l \
--data visdrone.yaml \
--device 0 \
--batch 1 \
--project val_results \
--format pytorch
Models trained: OSNET_x0_25, ResNet18/34/50, and deepsort_reid.
Exported to TensorRT FP16 (1ms/image on Tesla T4).
See: trackers/reid_models/README.md
Tracker configs are in cfg/trackers/
Track custom video:
python track_sample.py --tracker sort.yaml --video path/to/your/video --model path/to/your/model
Track using dataset & config:
python track.py --cfg track.yamlPrepare ground truth:
python trackeval/prepare_gt_trackeval.py \
--BENCHMARK UAVDT \
--mot_path path/to/UAVDT-2024-MOTExample
python trackeval/prepare_gt_trackeval.py --BENCHMARK UAVDT --mot_path UAVDT-2024-MOTDirectory structure for results/
results/
├── data/
│ ├── gt/
│ │ └── UAVDT/
│ └── trackers/
│ └── UAVDT/
│ └── val/
│ └── deepsort/
│ └── data/
│ └── track_result.txt
...
Run eval
python eval.py \
--GT_FOLDER results/data/gt/VisDrone \
--TRACKERS_FOLDER results/data/trackers/VisDrone \
--TRACKERS_TO_EVAL deepsort \
--SEQ_INFO uav0000117_02622_v
https://github.com/ultralytics/ultralytics
https://github.com/JonathonLuiten/TrackEval
https://github.com/abewley/sort
https://github.com/mikel-brostrom/Yolov3_DeepSort_Pytorch
https://github.com/nwojke/deep_sort
https://github.com/noahcao/OC_SORT