If you like tiny code, Pytorch and Yolo, then you'll like TinyYolo.
-
This repo uses the new "Tiny-Oriented-Programming" paradigm invented by TinyGrad to implement a set of popular Yolo models and sample assignment algorithms.
-
No YAML files, just (hopefully) good, modular, readable, minimal yet complete code.
-
This is a library, not a framework.
-
This library is for developers more so than for users.
models.py: contains all the Yolo models, which automatically calculate loss when targets are provided in forward function.assigner.cpp: contains various sample assignment algorithms written in pure Pytorch C++test.py: tests the models with pretrained weights from darknet, ultralytics, Yolov6 and Yolov7 official repos.train_coco.py: an example training script that trains on COCO using lightning
- Yolov3
- Yolov3-spp
- Yolov3-tiny
- Yolov4
- Yolov4-tiny
- Yolov5(n,s,m,l,x)
- Yolov6(n,s,m)
- Yolov7
- Yolov8(n,s,m,l,x)
- Yolov10(n,s,m,b,l,x)
- Yolo11(n,s,m,l,x)
- Yolo12(n,s,m,l,x)
- Yolo26(n,s,m,l,x)
nc = ... # number of classes, e.g. 80 for COCO
net = Yolov3(nc, spp=True).eval()
# net = Yolov4(nc).eval()
# net = Yolov5('n', nc).eval()
# net = Yolov6('n', nc).eval()
# net = Yolov7(nc).eval()
# net = Yolov8('n', nc).eval()
# net = Yolov10('n', nc).eval()
# net = Yolov11('n', nc).eval()
# net = Yolov12('n', nc).eval()
# net = Yolov26('n', nc).eval()
# Inference only
B = ... # Batch size
H = ... # Height
W = ... # Width
x = torch.randn(B, 3, H, W) # image-like
preds = net(x) # preds.shape == [B, N, nc+5] or nc+4 if there's no objectness feature e.g V5, V6, V8, V10
# Train
D = ... # max number of target detections per batch
C = 5 # target features [x1,y1,x2,y2,cls] where x1,x2 ∈ [0,W], y1,y2 ∈ [0,H], cls ∈ [-1,nc) and -1 is used for padding
targets = torch.randn(B, D, C)
preds, loss_dict = net(x, targets)
Export it...
net = Yolov3(80, spp=True).eval()
x = torch.randn(4, 3, 640, 640)
_ = net(x) # compile all the einops kernels. Required before ONNX export
torch.onnx.export(net, (x,), '/tmp/model.onnx',
input_names=['img'], output_names=['preds'],
dynamic_axes={'img' : {0: 'B', 2: 'H', 3: 'W'},
'preds' : {0: 'B', 1: 'N'}})
Run it...
Install dependencies:
pip install numpy onnxruntime
import onnxruntime as ort
import numpy as np
net = ort.InferenceSession('/tmp/model.onnx', providers=['CPUExecutionProvider'])
x = np.random.randn((1, 3, 576, 768))
preds, = net.run(None, {'img': x})
Compile it...
Download onnxmlir and use the onnx-mlir.py script.
python3 onnx-mlir.py --EmitObj -O3 /tmp/model.onnx -o model
Convert it...
Install dependencies:
pip install onnx2tf tensorflow tf_keras onnx_graphsurgeon sng4onnx onnxsim
onnx2tf -i /tmp/model.onnx -ois "img:1,3,640,640" -o /tmp/model
-
I advise using lightning or accelerate to write training scripts. They take care of everything including distributed training, FP16, checkpointing, etc.
-
The sample assignment algorithms are written in Pytorch C++ for simplicity. Indeed, when I first wrote them in Python, 90% of the complexity was in crazy tensor indexing and masking. This was destracting and annoying. In C++ you can use for-loops and if-statements without loss of performance. The algorithms are much more readable now. The only drawback is that you have to put tensors back onto CPU, perform the algorithm, then put the returned tensors (target boxes, scores and classes) back onto target device, usually GPU.
-
Pretty much all official models use
eps=0.001andmomentum=0.03innn.Batchnorm2d. Those aren't the Pytorch defaults. Where do those numbers come from? -
From what I can tell the main innovation in yolov6 is the distillation loss in bounding box regression: there are two branches for bounding box, one with DFL and one without. AFAIK, only the DFL one gets used in forward pass. During training, both get CIOU loss-ed.
-
onnx-mlir is very slow and runs on 1 thread only. So onnxruntime is better for inferrence.
-
Yolov12's area attention is a bit misleading. It sounds like tiled attention, but no, more like rows.
-
Yolov12 doesn't have as much attention as you think. Only 1 extra layer of attention compared to Yolov11. The head has no attention. Look at the args. So what's the big deal.
-
In my opinion there is little innovation in the new Yolo models. It's just tweaks.
-
Research should go into training recipes rather than chasing MAP scores. Come up with a way to train a model on COCO in under 5 epochs.
-
Very little model architecture changes between Yolo11 and Yolo26. SPPF has shortcut and there is a sprinkle more attention.
-
My understand of
end2endtraining (YoloV10 and Yolo26) is you have two sets of normal V8 regression losses each with its own TAL assigner: one for one2many with topk set to 10 and another for one2one with topk set to 1. You minimise both losses with weights of 0.8 for one2many and 0.2 for one2one. These weights are then decayed and inversely decayed respectively. The idea is that the one2many branch gives lots of positive signal to the gradients which helps with training and helps refine the one2one branch. I think that's it. -
A good implementation of NMS in say C++ can be very fast. Not convinced removing NMS gives that much of a boost in performance as the ultralytics authors claim. Not convinced
end2endtraining is necessary.
- Train everything (probably going to need some cloud compute (help))
- Train with mixed precision
This project is licensed under the MIT License. See the LICENSE file for details.
Pretrained weights downloaded by helper scripts are subject to their own licenses. See THIRD_PARTY_NOTICES.md for details.