Skip to content

Commit 6317d66

Browse files
authored
Refactor & Enhance: Inference Pipeline, ONNX Export, and Core Modules (#19)
* train successfully * update exporter * add sample inference for image * working inference * working video det * add cuda and trt ep * working webcam inference * update * use cuda and trt for image inference * add video inference * add webcam specs * update readme * update gradio demo * update readme * update command * update readme * update * update * update * update * Refine README instructions for live inference commands, ensuring consistent formatting and clarifying input parameters for video and image inference. * update * udpate * update * Update README.md * update * update * Update README.md * Update README.md * update quickstart * add credit * update * update * update * update * update * update readme * add dash * update key features * updte * add openvino export
1 parent d6f6d3f commit 6317d66

20 files changed

+1687
-882
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ outputs/
33
trt_cache/
44
# Dataset
55
dataset_collections/
6+
checkpoints/
67

78
# Byte-compiled / optimized / DLL files
89
__pycache__/

README.md

Lines changed: 165 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,44 @@
1212
<p>DEIMKit is a Python wrapper for <a href="https://github.com/ShihuaHuang95/DEIM">DEIM: DETR with Improved Matching for Fast Convergence</a>. Check out the original repo for more details.</p>
1313
</div>
1414

15+
16+
17+
<!-- Add HTML Table of Contents -->
18+
<div align="center">
19+
<br />
20+
<table>
21+
<tr>
22+
<td align="center">
23+
<a href="#-why-deimkit">🤔 Why DEIMKit?</a>
24+
</td>
25+
<td align="center">
26+
<a href="#-key-features">🌟 Key Features</a>
27+
</td>
28+
<td align="center">
29+
<a href="#-installation">📦 Installation</a>
30+
</td>
31+
<td align="center">
32+
<a href="#-usage">🚀 Usage</a>
33+
</td>
34+
</tr>
35+
<tr>
36+
<td align="center">
37+
<a href="#-inference">💡 Inference</a>
38+
</td>
39+
<td align="center">
40+
<a href="#-training">🏋️ Training</a>
41+
</td>
42+
<td align="center">
43+
<a href="#-export">💾 Export</a>
44+
</td>
45+
<td align="center">
46+
<a href="#-disclaimer">⚠️ Disclaimer</a>
47+
</td>
48+
</tr>
49+
</table>
50+
</div>
51+
52+
<br />
1553
<div align="center">
1654
<a href="https://colab.research.google.com/github/dnth/DEIMKit/blob/main/nbs/colab-quickstart.ipynb">
1755
<img src="https://img.shields.io/badge/Open%20In-Colab-blue?style=for-the-badge&logo=google-colab" alt="Open In Colab"/>
@@ -22,29 +60,35 @@
2260
</div>
2361
</div>
2462

25-
26-
## Why DEIMKit?
63+
## 🤔 Why DEIMKit?
2764

2865
- **Pure Python Configuration** - No complicated YAML files, just clean Python code
2966
- **Cross-Platform Simplicity** - Single command installation on Linux, macOS, and Windows
3067
- **Intuitive API** - Load, train, predict, export in just a few lines of code
3168

32-
## Supported Features
33-
34-
- [x] Inference
35-
- [x] Training
36-
- [x] Export
37-
38-
39-
## Installation
40-
41-
### Using pip
42-
Install [torch](https://pytorch.org/get-started/locally/) and torchvision as a pre-requisite.
43-
44-
## Installation
45-
46-
### Using pip
47-
Install [torch](https://pytorch.org/get-started/locally/) and torchvision as a pre-requisite.
69+
## 🌟 Key Features
70+
71+
* **💡 Inference**
72+
* [x] Single Image & Batch Prediction
73+
* [x] Load Pretrained & Custom Models
74+
* [x] Built-in Result Visualization
75+
* [x] Live ONNX Inference (Webcam, Video, Image)
76+
* **🏋️ Training**
77+
* [x] Single & Multi-GPU Training
78+
* [x] Custom Dataset Support (COCO Format)
79+
* [x] Flexible Configuration via Pure Python
80+
* **💾 Export**
81+
* [x] Export Trained Models to ONNX
82+
* [x] ONNX Model with Integrated Preprocessing
83+
* **🛠️ Utilities & Demos**
84+
* [x] Cross-Platform Support (Linux, macOS, Windows)
85+
* [x] Pixi Environment Management Integration
86+
* [x] Interactive Gradio Demo Script
87+
88+
## 📦 Installation
89+
90+
### 📥 Using pip
91+
If you're installing using pip, install [torch](https://pytorch.org/get-started/locally/) and torchvision as a pre-requisite.
4892

4993
Next, install the package.
5094
Bleeding edge version
@@ -57,7 +101,7 @@ Stable version
57101
pip install git+https://github.com/dnth/DEIM.git@v0.1.1
58102
```
59103

60-
### Using Pixi
104+
### 🔌 Using Pixi
61105

62106
> [!TIP]
63107
> I recommend using [Pixi](https://pixi.sh) to run this package. Pixi makes it easy to install the right version of Python and the dependencies to run this package on any platform!
@@ -85,7 +129,7 @@ This will download a toy dataset with 8 images, and train a model on it for 3 ep
85129

86130
If this runs without any issues, you've got a working Python environment with all the dependencies installed. This also installs DEIMKit in editable mode for development. See the [pixi cheatsheet](#-pixi-cheat-sheet) below for more.
87131

88-
## Usage
132+
## 🚀 Usage
89133

90134
List models supported by DEIMKit
91135

@@ -103,7 +147,7 @@ list_models()
103147
'deim_hgnetv2_x']
104148
```
105149

106-
### Inference
150+
### 💡 Inference
107151

108152
Load a pretrained model by the original authors
109153

@@ -157,7 +201,7 @@ Stomata Dataset
157201

158202
See the [demo notebook on using pretrained models](nbs/pretrained-model-inference.ipynb) and [custom model inference](nbs/custom-model-inference.ipynb) for more details.
159203

160-
### Training
204+
### 🏋️ Training
161205

162206
DEIMKit provides a simple interface for training your own models.
163207

@@ -225,7 +269,8 @@ Navigate to the http://localhost:6006/ in your browser to view the training prog
225269

226270
![alt text](assets/tensorboard.png)
227271

228-
### Export
272+
### 💾 Export
273+
Currently, the export function is only used for exporting the model to ONNX and run it using ONNXRuntime (see [Live Inference](#-live-inference) for more details). I think one could get pretty far with this even on a low resource machine. Drop an issue if you think this should be extended to other formats.
229274

230275
```python
231276
from deimkit.exporter import Exporter
@@ -240,58 +285,128 @@ output_path = exporter.to_onnx(
240285
)
241286
```
242287

243-
### Gradio App
244-
Run a Gradio app to interact with your model.
288+
> [!NOTE]
289+
> The exported model will accept raw BGR images of any size. It will also handle the preprocessing internally. Credit to [PINTO0309](https://github.com/PINTO0309/DEIM) for the implementation.
290+
>
291+
> ![onnx model](assets/exported_onnx.png)
292+
293+
> [!TIP]
294+
> If you want to export to OpenVINO you can do so directly from the ONNX model.
295+
>
296+
>
297+
> ```python
298+
> import onnx
299+
> from onnx import helper
300+
>
301+
> model = onnx.load("best.onnx")
302+
>
303+
> # Change the mode attribute of the GridSample node to bilinear as this operation is not supported in OpenVINO
304+
> for node in model.graph.node:
305+
> if node.op_type == 'GridSample':
306+
> for i, attr in enumerate(node.attribute):
307+
> if attr.name == 'mode' and attr.s == b'linear':
308+
> # Replace 'linear' with 'bilinear'
309+
> node.attribute[i].s = b'bilinear'
310+
>
311+
> # Save the modified model
312+
> onnx.save(model, "best_prep_openvino.onnx")
313+
> ```
314+
> You can then use the live inference script to run inference on the OpenVINO model.
315+
316+
### 🖥️ Gradio App
317+
Run a Gradio app to interact with your model. The app will accept raw BGR images of any size. It will also handle the preprocessing internally using the exported ONNX model.
245318
246319
```bash
247-
python scripts/gradio_demo.py
320+
python scripts/gradio_demo.py \
321+
--model "best.onnx" \
322+
--classes "classes.txt" \
323+
--examples "Rock Paper Scissors SXSW.v14i.coco/test"
248324
```
249325
![alt text](assets/gradio_demo.png)
250326

251-
### Live Inference
327+
> [!NOTE]
328+
> The demo app uses onnx model and onnxruntime for inference. Additionally, I have also made it that the ONNX model to accept any input size, despite the original model was trained on 640x640 images.
329+
> This means you can use any image size you want. Play around with the input size slider to see what works best for your model.
330+
> Some objects are visible even at lower input sizes, this means you can use a lower input size to speed up inference.
331+
332+
### 🎥 Live Inference
252333
Run live inference on a video, image or webcam using ONNXRuntime. This runs on CPU by default.
253-
If you would like to use the CUDA backend, you can install the `onnxruntime-gpu` package and uninstall the `onnxruntime` package.
334+
If you would like to use the CUDA backend, install the `onnxruntime-gpu` package and uninstall the `onnxruntime` package.
254335

255-
For video inference, specify the path to the video file as the input. Output video will be saved as `onnx_result.mp4` in the current directory.
336+
For running inference on a webcam, set the `--webcam` flag.
256337

257338
```bash
258339
python scripts/live_inference.py
259-
--onnx model.onnx # Path to the ONNX model file
260-
--input video.mp4 # Path to the input video file
261-
--class-names classes.txt # Path to the classes file with each name on a new row
262-
--input-size 320 # Input size for the model
340+
--model model.onnx # Path to the ONNX model file
341+
--webcam # Use webcam as input source
342+
--classes classes.txt # Path to the classes file with each name on a new row
343+
--video-width 720 # Input size for the model
344+
--provider tensorrt # Execution provider (cpu/cuda/tensorrt)
345+
--threshold 0.3 # Detection confidence threshold
263346
```
264347

265-
The following is a demo of video inference after training for about 50 epochs on the vehicles dataset with image size 320x320.
348+
Because we are handling the preprocessing internally in the ONNX model, the input size is not limited to the original 640x640. You can use any input size you want for inference. The model was trained on 640x640 images. Integrating the preprocessing internally in the ONNX model also lets us run inference at very high FPS as it uses more efficient onnx operators.
266349

267-
https://github.com/user-attachments/assets/5066768f-c97e-4999-af81-ffd29d88f529
350+
The following is a model I trained on a custom dataset using the deim_hgnetv2_s model and exported to ONNX. Here are some examples of inference on a webcam at different video resolutions.
268351

352+
Webcam video width at 1920x1080 pixels (1080p):
269353

270-
You can also run live inference on a webcam by setting the `webcam` flag.
354+
https://github.com/user-attachments/assets/bd98eb1e-feff-4b53-9fa9-d4aff6a724e0
355+
356+
Webcam video width at 1280x720 pixels (720p):
357+
358+
https://github.com/user-attachments/assets/31a8644e-e0c6-4bba-9d4f-857a3d0b53e1
359+
360+
Webcam video width at 848x480 pixels (480p):
361+
362+
https://github.com/user-attachments/assets/aa267f05-5dbd-4824-973c-62f3b8f59c80
363+
364+
Webcam video width at 640x480 pixels (480p):
365+
366+
https://github.com/user-attachments/assets/3d0c04c0-645a-4d54-86c0-991930491113
367+
368+
Webcam video width at 320x240 pixels (240p):
369+
370+
https://github.com/user-attachments/assets/f4afff9c-3e6d-4965-ab86-0d4de7ce1a44
371+
372+
373+
374+
375+
For video inference, specify the path to the video file as the input. Output video will be saved as `onnx_result.mp4` in the current directory.
271376

272377
```bash
273378
python scripts/live_inference.py
274-
--onnx model.onnx # Path to the ONNX model file
275-
--webcam # Use webcam as input source
276-
--class-names classes.txt # Path to the classes file. Each class name should be on a new line.
277-
--input-size 320 # Input size for the model
379+
--model model.onnx # Path to the ONNX model file
380+
--video video.mp4 # Path to the input video file
381+
--classes classes.txt # Path to the classes file with each name on a new row
382+
--video-width 320 # Input size for the model
383+
--provider cpu # Execution provider (cpu/cuda/tensorrt)
384+
--threshold 0.3 # Detection confidence threshold
278385
```
279-
The following is a demo of webcam inference after training on the rock paper scissors dataset 640x640 resolution image.
386+
https://github.com/user-attachments/assets/6bc1dc6a-a223-4220-954d-2dab5c75b4a8
387+
388+
The following is an inference using the pre-trained model `deim_hgnetv2_x` trained on COCO. See how I exported the pre-trained model to onnx in this notebook [here](nbs/export.ipynb).
280389

281-
https://github.com/user-attachments/assets/6e5dbb15-4e3a-45a3-997e-157bb9370146
390+
https://github.com/user-attachments/assets/77070ea4-8407-4648-ade3-01cacd77b51b
282391

283392

284393
For image inference, specify the path to the image file as the input.
394+
285395
```bash
286396
python scripts/live_inference.py
287-
--onnx model.onnx # Path to the ONNX model file
288-
--input image.jpg # Path to the input image file
289-
--class-names classes.txt # Path to the classes file. Each class name should be on a new line.
290-
--input-size 320 # Input size for the model
397+
--model model.onnx # Path to the ONNX model file
398+
--image image.jpg # Path to the input image file
399+
--classes classes.txt # Path to the classes file with each name on a new row
400+
--provider cpu # Execution provider (cpu/cuda/tensorrt)
401+
--threshold 0.3 # Detection confidence threshold
291402
```
403+
404+
405+
406+
292407
The following is a demo of image inference
293408

294-
![image](assets/sample_result_image.jpg)
409+
![image](assets/sample_result_image_1.jpg)
295410

296411
> [!TIP]
297412
> If you are using Pixi, you can run the live inference script with the following command with the same arguments as above.
@@ -308,7 +423,7 @@ The following is a demo of image inference
308423
> If you want to use the CPU, replace `cuda` with `cpu` in the command above.
309424
310425
311-
## Pixi Cheat Sheet
426+
## 📝 Pixi Cheat Sheet
312427
Here are some useful tasks you can run with Pixi.
313428
314429
Run a quickstart
@@ -352,7 +467,7 @@ pixi run -e cpu live-inference --onnx model.onnx --input video.mp4 --class-names
352467
353468
Launch Gradio app
354469
```bash
355-
pixi run -e cuda gradio-demo
470+
pixi run gradio-demo --model "best_prep.onnx" --classes "classes.txt" --examples "Rock Paper Scissors SXSW.v14i.coco/test"
356471
```
357472

358473
```bash
@@ -366,5 +481,5 @@ pixi run export --config config.yml --checkpoint model.pth --output model.onnx
366481

367482

368483

369-
## Disclaimer
484+
## ⚠️ Disclaimer
370485
I'm not affiliated with the original DEIM authors. I just found the model interesting and wanted to try it out. The changes made here are of my own. Please cite and star the original repo if you find this useful.

assets/exported_onnx.png

53.5 KB
Loading

assets/gradio_demo.png

-4.56 KB
Loading

assets/sample_result_image_1.jpg

13.7 KB
Loading

0 commit comments

Comments
 (0)