Skip to content

Commit 1ff5a10

Browse files
committed
Add support of dynamic batch size
1 parent 6d015a4 commit 1ff5a10

File tree

8 files changed

+226
-141
lines changed

8 files changed

+226
-141
lines changed

README.md

Lines changed: 32 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,6 @@ See following sections for more details of conversions.
8989
| ------------------- | ----------: | ----------: | ----------: | ----------: | ----------: | ----------: |
9090
| DarkNet (YOLOv4 paper)| 0.471 | 0.710 | 0.510 | 0.278 | 0.525 | 0.636 |
9191
| Pytorch (TianXiaomo)| 0.466 | 0.704 | 0.505 | 0.267 | 0.524 | 0.629 |
92-
| ONNX | incoming | incoming | incoming | incoming | incoming | incoming |
9392
| TensorRT FP32 + BatchedNMSPlugin | 0.472| 0.708 | 0.511 | 0.273 | 0.530 | 0.637 |
9493
| TensorRT FP16 + BatchedNMSPlugin | 0.472| 0.708 | 0.511 | 0.273 | 0.530 | 0.636 |
9594
@@ -99,7 +98,6 @@ See following sections for more details of conversions.
9998
| ------------------- | ----------: | ----------: | ----------: | ----------: | ----------: | ----------: |
10099
| DarkNet (YOLOv4 paper)| 0.412 | 0.628 | 0.443 | 0.204 | 0.444 | 0.560 |
101100
| Pytorch (TianXiaomo)| 0.404 | 0.615 | 0.436 | 0.196 | 0.438 | 0.552 |
102-
| ONNX | incoming | incoming | incoming | incoming | incoming | incoming |
103101
| TensorRT FP32 + BatchedNMSPlugin | 0.412| 0.625 | 0.445 | 0.200 | 0.446 | 0.564 |
104102
| TensorRT FP16 + BatchedNMSPlugin | 0.412| 0.625 | 0.445 | 0.200 | 0.446 | 0.563 |
105103
@@ -163,10 +161,11 @@ Until now, still a small piece of post-processing including NMS is required. We
163161
python demo_darknet2onnx.py <cfgFile> <weightFile> <imageFile> <batchSize>
164162
```
165163

166-
This script will generate 2 ONNX models.
164+
## 3.1 Dynamic or static batch size
167165

168-
- One is for running the demo (batch_size=1)
169-
- The other one is what you want to generate (batch_size=batchSize)
166+
- **Positive batch size will generate ONNX model of static batch size, otherwise, batch size will be dynamic**
167+
- Dynamic batch size will generate only one ONNX model
168+
- Static batch size will generate 2 ONNX models, one is for running the demo (batch_size=1)
170169

171170
# 4. Pytorch2ONNX (Evolving)
172171

@@ -195,34 +194,54 @@ Until now, still a small piece of post-processing including NMS is required. We
195194
python demo_pytorch2onnx.py yolov4.pth dog.jpg 8 80 416 416
196195
```
197196

198-
This script will generate 2 ONNX models.
197+
## 4.1 Dynamic or static batch size
199198

200-
- One is for running the demo (batch_size=1)
201-
- The other one is what you want to generate (batch_size=batch_size)
199+
- **Positive batch size will generate ONNX model of static batch size, otherwise, batch size will be dynamic**
200+
- Dynamic batch size will generate only one ONNX model
201+
- Static batch size will generate 2 ONNX models, one is for running the demo (batch_size=1)
202202

203203

204204
# 5. ONNX2TensorRT (Evolving)
205205

206206
- **TensorRT version Recommended: 7.0, 7.1**
207207

208+
## 5.1 Convert from ONNX of static Batch size
209+
208210
- **Run the following command to convert VOLOv4 ONNX model into TensorRT engine**
209211

210212
```sh
211213
trtexec --onnx=<onnx_file> --explicitBatch --saveEngine=<tensorRT_engine_file> --workspace=<size_in_megabytes> --fp16
212214
```
213215
- Note: If you want to use int8 mode in conversion, extra int8 calibration is needed.
214216

215-
- **Run the demo**
217+
## 5.2 Convert from ONNX of dynamic Batch size
218+
219+
- **Run the following command to convert VOLOv4 ONNX model into TensorRT engine**
216220

217221
```sh
218-
python demo_trt.py <tensorRT_engine_file> <input_image> <input_H> <input_W>
222+
trtexec --onnx=<onnx_file> \
223+
--minShapes=input:<shape_of_min_batch> --optShapes=input:<shape_of_opt_batch> --maxShapes=input:<shape_of_max_batch> \
224+
--workspace=<size_in_megabytes> --saveEngine=yolov4_-1_3_320_512_dyna.engine --fp16
219225
```
226+
- For example:
227+
228+
```sh
229+
trtexec --onnx=yolov4_-1_3_320_512_dynamic.onnx \
230+
--minShapes=input:1x3x320x512 --optShapes=input:4x3x320x512 --maxShapes=input:8x3x320x512 \
231+
--workspace=2048 --saveEngine=yolov4_-1_3_320_512_dynamic.engine --fp16
232+
```
233+
234+
## 5.3 Run the demo
235+
236+
```sh
237+
python demo_trt.py <tensorRT_engine_file> <input_image> <input_H> <input_W>
238+
```
220239

221-
- This demo here only works when batchSize=1, but you can update this demo a little for batched inputs.
240+
- This demo here only works when batchSize is dynamic (1 should be within dynamic range) or batchSize=1, but you can update this demo a little for other dynamic or static batch sizes.
222241

223-
- Note1: input_H and input_W should agree with the input size in the original ONNX file.
242+
- Note1: input_H and input_W should agree with the input size in the original ONNX file.
224243

225-
- Note2: extra NMS operations are needed for the tensorRT output. This demo uses python NMS code from `tool/utils.py`.
244+
- Note2: extra NMS operations are needed for the tensorRT output. This demo uses python NMS code from `tool/utils.py`.
226245

227246

228247
# 6. ONNX2Tensorflow

demo_darknet2onnx.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,13 @@
1212

1313
def main(cfg_file, weight_file, image_path, batch_size):
1414

15-
# Transform to onnx as specified batch size
16-
transform_to_onnx(cfg_file, weight_file, batch_size)
17-
# Transform to onnx for demo
18-
onnx_path_demo = transform_to_onnx(cfg_file, weight_file, 1)
15+
if batch_size <= 0:
16+
onnx_path_demo = transform_to_onnx(cfg_file, weight_file, batch_size)
17+
else:
18+
# Transform to onnx as specified batch size
19+
transform_to_onnx(cfg_file, weight_file, batch_size)
20+
# Transform to onnx as demo
21+
onnx_path_demo = transform_to_onnx(cfg_file, weight_file, 1)
1922

2023
session = onnxruntime.InferenceSession(onnx_path_demo)
2124
# session = onnx.load(onnx_path)

demo_pytorch2onnx.py

Lines changed: 48 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -19,32 +19,59 @@ def transform_to_onnx(weight_file, batch_size, n_classes, IN_IMAGE_H, IN_IMAGE_W
1919
pretrained_dict = torch.load(weight_file, map_location=torch.device('cuda'))
2020
model.load_state_dict(pretrained_dict)
2121

22-
x = torch.randn((batch_size, 3, IN_IMAGE_H, IN_IMAGE_W), requires_grad=True) # .cuda()
23-
24-
onnx_file_name = "yolov4_{}_3_{}_{}.onnx".format(batch_size, IN_IMAGE_H, IN_IMAGE_W)
25-
26-
# Export the model
27-
print('Export the onnx model ...')
28-
torch.onnx.export(model,
29-
x,
30-
onnx_file_name,
31-
export_params=True,
32-
opset_version=11,
33-
do_constant_folding=True,
34-
input_names=['input'], output_names=['boxes', 'confs'],
35-
dynamic_axes=None)
36-
37-
print('Onnx model exporting done')
38-
return onnx_file_name
22+
input_names = ["input"]
23+
output_names = ['boxes', 'confs']
24+
25+
dynamic = False
26+
if batch_size <= 0:
27+
dynamic = True
28+
29+
if dynamic:
30+
x = torch.randn((1, 3, IN_IMAGE_H, IN_IMAGE_W), requires_grad=True)
31+
onnx_file_name = "yolov4_-1_3_{}_{}_dynamic.onnx".format(IN_IMAGE_H, IN_IMAGE_W)
32+
dynamic_axes = {"input": {0: "batch_size"}, "boxes": {0: "batch_size"}, "confs": {0: "batch_size"}}
33+
# Export the model
34+
print('Export the onnx model ...')
35+
torch.onnx.export(model,
36+
x,
37+
onnx_file_name,
38+
export_params=True,
39+
opset_version=11,
40+
do_constant_folding=True,
41+
input_names=input_names, output_names=output_names,
42+
dynamic_axes=dynamic_axes)
43+
44+
print('Onnx model exporting done')
45+
return onnx_file_name
46+
47+
else:
48+
x = torch.randn((batch_size, 3, IN_IMAGE_H, IN_IMAGE_W), requires_grad=True)
49+
onnx_file_name = "yolov4_{}_3_{}_{}_static.onnx".format(batch_size, IN_IMAGE_H, IN_IMAGE_W)
50+
# Export the model
51+
print('Export the onnx model ...')
52+
torch.onnx.export(model,
53+
x,
54+
onnx_file_name,
55+
export_params=True,
56+
opset_version=11,
57+
do_constant_folding=True,
58+
input_names=input_names, output_names=output_names,
59+
dynamic_axes=None)
60+
61+
print('Onnx model exporting done')
62+
return onnx_file_name
3963

4064

4165

4266
def main(weight_file, image_path, batch_size, n_classes, IN_IMAGE_H, IN_IMAGE_W):
4367

44-
# Transform to onnx as specified batch size
45-
transform_to_onnx(weight_file, batch_size, n_classes, IN_IMAGE_H, IN_IMAGE_W)
46-
# Transform to onnx for demo
47-
onnx_path_demo = transform_to_onnx(weight_file, 1, n_classes, IN_IMAGE_H, IN_IMAGE_W)
68+
if batch_size <= 0:
69+
onnx_path_demo = transform_to_onnx(weight_file, batch_size, n_classes, IN_IMAGE_H, IN_IMAGE_W)
70+
else:
71+
# Transform to onnx as specified batch size
72+
transform_to_onnx(weight_file, batch_size, n_classes, IN_IMAGE_H, IN_IMAGE_W)
73+
# Transform to onnx for demo
74+
onnx_path_demo = transform_to_onnx(weight_file, 1, n_classes, IN_IMAGE_H, IN_IMAGE_W)
4875

4976
session = onnxruntime.InferenceSession(onnx_path_demo)
5077
# session = onnx.load(onnx_path)

demo_trt.py

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,13 +73,20 @@ def __repr__(self):
7373
return self.__str__()
7474

7575
# Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
76-
def allocate_buffers(engine):
76+
def allocate_buffers(engine, batch_size):
7777
inputs = []
7878
outputs = []
7979
bindings = []
8080
stream = cuda.Stream()
8181
for binding in engine:
82-
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
82+
83+
size = trt.volume(engine.get_binding_shape(binding)) * batch_size
84+
dims = engine.get_binding_shape(binding)
85+
86+
# in case batch dimension is -1 (dynamic)
87+
if dims[0] < 0:
88+
size *= -1
89+
8390
dtype = trt.nptype(engine.get_binding_dtype(binding))
8491
# Allocate host and device buffers
8592
host_mem = cuda.pagelocked_empty(size, dtype)
@@ -112,7 +119,10 @@ def do_inference(context, bindings, inputs, outputs, stream):
112119

113120
def main(engine_path, image_path, image_size):
114121
with get_engine(engine_path) as engine, engine.create_execution_context() as context:
115-
buffers = allocate_buffers(engine)
122+
buffers = allocate_buffers(engine, 1)
123+
IN_IMAGE_H, IN_IMAGE_W = image_size
124+
context.set_binding_shape(0, (1, 3, IN_IMAGE_H, IN_IMAGE_W))
125+
116126
image_src = cv2.imread(image_path)
117127

118128
num_classes = 80

models.py

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,17 +20,20 @@ def __init__(self):
2020

2121
def forward(self, x, target_size, inference=False):
2222
assert (x.data.dim() == 4)
23-
_, _, tH, tW = target_size
23+
# _, _, tH, tW = target_size
2424

2525
if inference:
26-
B = x.data.size(0)
27-
C = x.data.size(1)
28-
H = x.data.size(2)
29-
W = x.data.size(3)
3026

31-
return x.view(B, C, H, 1, W, 1).expand(B, C, H, tH // H, W, tW // W).contiguous().view(B, C, tH, tW)
27+
#B = x.data.size(0)
28+
#C = x.data.size(1)
29+
#H = x.data.size(2)
30+
#W = x.data.size(3)
31+
32+
return x.view(x.size(0), x.size(1), x.size(2), 1, x.size(3), 1).\
33+
expand(x.size(0), x.size(1), x.size(2), target_size[2] // x.size(2), x.size(3), target_size[3] // x.size(3)).\
34+
contiguous().view(x.size(0), x.size(1), target_size[2], target_size[3])
3235
else:
33-
return F.interpolate(x, size=(tH, tW), mode='nearest')
36+
return F.interpolate(x, size=(target_size[2], target_size[3]), mode='nearest')
3437

3538

3639
class Conv_Bn_Activation(nn.Module):

tool/darknet2onnx.py

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,23 +3,23 @@
33
from tool.darknet2pytorch import Darknet
44

55

6-
def transform_to_onnx(cfgfile, weightfile, batch_size=1, dynamic=False):
6+
def transform_to_onnx(cfgfile, weightfile, batch_size=1):
77
model = Darknet(cfgfile)
88

99
model.print_network()
1010
model.load_weights(weightfile)
1111
print('Loading weights from %s... Done!' % (weightfile))
1212

13-
# model.cuda()
13+
dynamic = False
14+
if batch_size <= 0:
15+
dynamic = True
1416

15-
x = torch.randn((batch_size, 3, model.height, model.width), requires_grad=True) # .cuda()
17+
input_names = ["input"]
18+
output_names = ['boxes', 'confs']
1619

1720
if dynamic:
18-
19-
onnx_file_name = "yolov4_{}_3_{}_{}_dyna.onnx".format(batch_size, model.height, model.width)
20-
input_names = ["input"]
21-
output_names = ['boxes', 'confs']
22-
21+
x = torch.randn((1, 3, model.height, model.width), requires_grad=True)
22+
onnx_file_name = "yolov4_-1_3_{}_{}_dynamic.onnx".format(model.height, model.width)
2323
dynamic_axes = {"input": {0: "batch_size"}, "boxes": {0: "batch_size"}, "confs": {0: "batch_size"}}
2424
# Export the model
2525
print('Export the onnx model ...')
@@ -36,14 +36,15 @@ def transform_to_onnx(cfgfile, weightfile, batch_size=1, dynamic=False):
3636
return onnx_file_name
3737

3838
else:
39+
x = torch.randn((batch_size, 3, model.height, model.width), requires_grad=True)
3940
onnx_file_name = "yolov4_{}_3_{}_{}_static.onnx".format(batch_size, model.height, model.width)
4041
torch.onnx.export(model,
4142
x,
4243
onnx_file_name,
4344
export_params=True,
4445
opset_version=11,
4546
do_constant_folding=True,
46-
input_names=['input'], output_names=['boxes', 'confs'],
47+
input_names=input_names, output_names=output_names,
4748
dynamic_axes=None)
4849

4950
print('Onnx model exporting done')

tool/darknet2pytorch.py

Lines changed: 13 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -55,15 +55,12 @@ def __init__(self, stride=2):
5555
self.stride = stride
5656

5757
def forward(self, x):
58-
stride = self.stride
5958
assert (x.data.dim() == 4)
60-
B = x.data.size(0)
61-
C = x.data.size(1)
62-
H = x.data.size(2)
63-
W = x.data.size(3)
64-
ws = stride
65-
hs = stride
66-
x = x.view(B, C, H, 1, W, 1).expand(B, C, H, stride, W, stride).contiguous().view(B, C, H * stride, W * stride)
59+
60+
x = x.view(x.size(0), x.size(1), x.size(2), 1, x.size(3), 1).\
61+
expand(x.size(0), x.size(1), x.size(2), self.stride, x.size(3), self.stride).contiguous().\
62+
view(x.size(0), x.size(1), x.size(2) * self.stride, x.size(3) * self.stride)
63+
6764
return x
6865

6966

@@ -73,14 +70,9 @@ def __init__(self, stride):
7370
self.stride = stride
7471

7572
def forward(self, x):
76-
x_numpy = x.cpu().detach().numpy()
77-
H = x_numpy.shape[2]
78-
W = x_numpy.shape[3]
79-
80-
H = H * self.stride
81-
W = W * self.stride
73+
assert (x.data.dim() == 4)
8274

83-
out = F.interpolate(x, size=(H, W), mode='nearest')
75+
out = F.interpolate(x, size=(x.size(2) * self.stride, x.size(3) * self.stride), mode='nearest')
8476
return out
8577

8678

@@ -246,15 +238,15 @@ def create_network(self, blocks):
246238
conv_id = 0
247239
for block in blocks:
248240
if block['type'] == 'net':
249-
prev_filters = int(float(block['channels']))
241+
prev_filters = int(block['channels'])
250242
continue
251243
elif block['type'] == 'convolutional':
252244
conv_id = conv_id + 1
253-
batch_normalize = int(float(block['batch_normalize']))
254-
filters = int(float(block['filters']))
255-
kernel_size = int(float(block['size']))
256-
stride = int(float(block['stride']))
257-
is_pad = int(float(block['pad']))
245+
batch_normalize = int(block['batch_normalize'])
246+
filters = int(block['filters'])
247+
kernel_size = int(block['size'])
248+
stride = int(block['stride'])
249+
is_pad = int(block['pad'])
258250
pad = (kernel_size - 1) // 2 if is_pad else 0
259251
activation = block['activation']
260252
model = nn.Sequential()

0 commit comments

Comments
 (0)