IEEE ICIP 2025: CADOT Challenge

Tri-Axial Scaling in Aerial Object Detection: Model Size, Dataset Size and Quality, and Test-Time Inference in the CADOT Challenge

By Team Double J (Jie): Yi Jie WONG & Jing Jie TAN et al

TLDR

Our team ranked 1st globally in the IEEE Big Data Cup 2024 - BEGC2024 challenge! 🏅🎉🥳 Our approach is simple, scale everything! We proposed a systematic Tri-Axial Scaling to approach Aerial Object Detection via:

Model Size
Dataset Size & Quality
Test-Time Inference

Basically, we achieve this Tri-Axial Scaling by:

Scaling model size
Diffusion Augmentation & Balanced Data Sampling
Test-Time Inference = Test-Time Augmentation + Ensemble Models

Basically, we notice that:

A larger model can learn more effectively from a noisy and imbalanced dataset compared to a smaller model.
A larger model benefits more from dataset size scaling.
A smaller model can also achieve performance comparable to a larger model through balanced data sampling.
A larger model tends to overfit when using a balanced data sampling strategy, but this can be mitigated by increasing the amount of data (hence, data scaling).

⬆️ Our diffusion augmentation pipeline converts annotations into synthetic image.
This figure is adopted from my proposed method from another competition.
I modified the pipeline to support bbox -> segmentation mask -> image generation.
A more up to date figure will be updated here soon!
To avoid overcomplicating this repo, we separate the code for diffusion augmentation in a separate repo.

⬆️ Scaling Model Size vs Scaling Data Size vs Scaling Test-Time Inference
Larger model is more effective in learning from imbalanced dataset.
Larger model also benefits from data size scaling even in the presence of imbalanced class.

⬆️ Scaling Model Size vs Scaling Data Quality vs Sacling Test-Time Inference
Smaller model benefits more from balanced sampling as opposed to larger models.
However, we see evidence of larger model (YOLO12s) to be better than smaller model (YOLO12n).
We hyphothesized that bigger dataset is required to unlock full potential of YOLO12x.

⬆️ Finally, we unleash the full potential of test-time scaling using ensemble model and TTA.
We apply Test Time Augmentation to all models in our ensemble to increase the detection rate.

Detailed elaboration on our solution can be found in our preprint.

📋 Table of content

Step 1: Setup Repo
Step 2: Setup Dataset
Step 3 (shortcut): Download Our Trained Models
Step 3: Training
Step 4: Inference

Step 1: Setup Repo

👆 Please refer our Colab link to try out our code seamlessly! You might need Colab Pro to train the larger YOLO variants.

Conda environment

conda create --name yolo python=3.10.12 -y
conda activate yolo

Clone this repo

# clone this repo
git clone https://github.com/yjwong1999/Double_J_CADOT_Challenge.git
cd Double_J_CADOT_Challenge

Install dependencies

# Please adjust the torch version accordingly depending on your OS
pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121

# Install Jupyter Notebook
pip install jupyter notebook==7.1.0

# install this version of ultralytics (for its dependencies)
pip install ultralytics==8.3.111

# uninstall default ultralytics and install my ultralytics that support Timm pretrained models
pip uninstall ultralytics -y
pip install git+https://github.com/DoubleY-BEGC2024/ultralytics-timm.git

# install this to use wighted box fusion
pip install ensemble-boxes

# Remaining dependencies
pip install pycocotools
pip install requests==2.32.3
pip install click==8.1.7

Step 2: Setup Dataset

# to convert the CADOT dataset from the default COCO format to YOLO annotation form
python setup_data.py

Step 3 (Shortcut): Download Our Trained Models

🚅 To avoid training all our models which are time-consuming, you can download our trained models using the provided bash script.

Alternatively, you can manually search our models from dropbox (in case the .sh file is not working in Windows machine).

bash download_our_model.sh

Step 3: Training (Hyperparameters discussion)

⚠️ If you had downloaded our trained models from previous steps, please skip this step. Else you will have repeated models.

❗Note that due to time constraints, we did not train all possible experiments. Hence, in general, our hyperparameters are chosen based on:

If trained via balanced sampling, batch size = 8, image size = 960, epochs = 100 for smallest YOLO12n, 50 for YOLO12s, 30 for YOLO12x
If trained without balanced sampling, batch size = 16, image size = 640, epochs = 100

Actually, we should set all image sizes to 960, but we only considered this step at a later stage. Meanwhile, setting a higher image size increases GPU memory requirements, so we have to lower the batch size. As for epochs, we set them all to 100 for training without balanced sampling. If trained with balanced sampling, we found that larger models tend to overfit, so we have to reduce the number of epochs.

Training Part 1 (without synthetic data), all models will be used for ensemble model in inference

# train ResNext101-YOLO12 naively without tricks
python3 train.py --model-name "../cfg/yolo12-resnext101-timm.yaml" --epoch 100 --batch 16 --imgsz 640

# train yolo12n using balanced sampling
python3 train_balanced.py --model-name "yolo12n.pt" --epoch 100 --batch 8 --imgsz 960

# train yolo12s using balanced sampling
python3 train_balanced.py --model-name "yolo12s.pt" --epoch 50 --batch 8 --imgsz 960

Training Part 2 (with synthetic data), all models will be used for ensemble model in inference

# setup our synthetic dataset (generated via diffusion augmentation)
python setup_synthetic_data.py

# train yolo12x with synthetic data only
python3 train_balanced.py --model-name "yolo12x.pt" --epoch 100 --batch 16 --imgsz 640

# train yolo12x using balanced sampling and synthetic data
python3 train_balanced.py --model-name "yolo12x.pt" --epoch 100 --batch 8 --imgsz 960

Move all 5 trained models into `Double_J_CADOT_Challenge/models` directory for ensemble model inference

python3 move_models.py

Step 4: Inference

Ensemble model + Test Time Augmentation

# Run the inference code
python3 infer.py --tta all

❗Note that:

Even when using the exact same dependencies (torch/numpy/ultralytics/etc), you might not obtain the same results.
This is because different machines, different CUDA, different random seed can also contribute to variations in results.
For instance, we tested training the exact same model and same hyperparameter configurations using the same A100 GPU, but on Google Colab and Lightning AI.
However, the performance discrepancies between the two models trained on different platforms were noticeable.
Hence, you might not be able to reproduce the exact same results.
Nevertheless, we believe our results on tri-axial scaling are valuable to the community 🤗

Acknowlegment

We would like to express our gratitude to the CADOT organizers for hosting this exciting challenge!

Citate our Work

Our solution has been invited to IEEE ICIP 2025! Please cite our paper if this repo helps your research. The preprint is available here.

@InProceedings{Wong2024,
title = {Tri-Axial Scaling in Aerial Object Detection: Model Size, Dataset Size and Quality, and Test-Time Inference in the CADOT Challenge},
author = {Yi Jie Wong and Jing Jie Tan and Mau-Luen Tham and Ban-Hoe Kwan and Yan Chai Hum},
booktitle={2025 IEEE International Conference on Image Processing (ICIP)},
year={2025}}

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
assets		assets
cfg		cfg
data		data
models		models
report		report
results		results
scripts		scripts
synthetic_data		synthetic_data
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IEEE ICIP 2025: CADOT Challenge

Tri-Axial Scaling in Aerial Object Detection: Model Size, Dataset Size and Quality, and Test-Time Inference in the CADOT Challenge

By Team Double J (Jie): Yi Jie WONG & Jing Jie TAN et al

TLDR

📋 Table of content

Step 1: Setup Repo

Step 2: Setup Dataset

Step 3 (Shortcut): Download Our Trained Models

Step 3: Training (Hyperparameters discussion)

Training Part 1 (without synthetic data), all models will be used for ensemble model in inference

Training Part 2 (with synthetic data), all models will be used for ensemble model in inference

Move all 5 trained models into `Double_J_CADOT_Challenge/models` directory for ensemble model inference

Step 4: Inference

Acknowlegment

Citate our Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IEEE ICIP 2025: CADOT Challenge

Tri-Axial Scaling in Aerial Object Detection: Model Size, Dataset Size and Quality, and Test-Time Inference in the CADOT Challenge

By Team Double J (Jie): Yi Jie WONG & Jing Jie TAN et al

TLDR

📋 Table of content

Step 1: Setup Repo

Step 2: Setup Dataset

Step 3 (Shortcut): Download Our Trained Models

Step 3: Training (Hyperparameters discussion)

Training Part 1 (without synthetic data), all models will be used for ensemble model in inference

Training Part 2 (with synthetic data), all models will be used for ensemble model in inference

Move all 5 trained models into Double_J_CADOT_Challenge/models directory for ensemble model inference

Step 4: Inference

Acknowlegment

Citate our Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Move all 5 trained models into `Double_J_CADOT_Challenge/models` directory for ensemble model inference

Packages