Skip to content

Commit 386bf01

Browse files
authored
Same changes as #3812 chore: Add more models for benchmark and polish codes (#3822)
1 parent 6a345da commit 386bf01

File tree

7 files changed

+367
-177
lines changed

7 files changed

+367
-177
lines changed

py/torch_tensorrt/dynamo/_compiler.py

Lines changed: 149 additions & 94 deletions
Large diffs are not rendered by default.

py/torch_tensorrt/dynamo/conversion/impl/normalization/ops.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -325,8 +325,9 @@ def native_group_norm(
325325

326326
shape = [1, group] + [1] * (rank - 2)
327327

328-
weight_torch = torch.ones(shape)
329-
bias_torch = torch.zeros(shape)
328+
with unset_fake_temporarily():
329+
weight_torch = torch.ones(shape)
330+
bias_torch = torch.zeros(shape)
330331

331332
weight_one = get_trt_tensor(ctx, weight_torch, f"{name}_weight_one", input.dtype)
332333
bias_zero = get_trt_tensor(ctx, bias_torch, f"{name}_bias_zero", input.dtype)

tools/perf/README.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ This is a comprehensive Python benchmark suite to run perf runs using different
66
2. Torch-TensorRT [Torchscript]
77
3. Torch-TensorRT [Dynamo]
88
4. Torch-TensorRT [torch_compile]
9-
5. TensorRT
9+
5. Torch Inductor
10+
6. ONNX-TensorRT
1011

1112

1213
## Prerequisite
@@ -42,8 +43,8 @@ Benchmark scripts depends on following Python packages in addition to requiremen
4243

4344
Here are the list of `CompileSpec` options that can be provided directly to compile the pytorch module
4445

45-
* `--backends` : Comma separated string of backends. Eg: torch, torch_compile, dynamo, tensorrt
46-
* `--model` : Name of the model file (Can be a torchscript module or a tensorrt engine (ending in `.plan` extension)). If the backend is `dynamo` or `torch_compile`, the input should be a Pytorch module (instead of a torchscript module).
46+
* `--backends` : Comma separated string of backends. Eg: torch, ts_trt, dynamo, torch_compile, inductor, onnx_trt
47+
* `--model` : Name of the model file (Can be a torchscript module or a tensorrt engine (pairing with `--is_trt_engine`)). If the backend is `dynamo` or `torch_compile`, the input should be a Pytorch module (instead of a torchscript module).
4748
* `--model_torch` : Name of the PyTorch model file (optional, only necessary if `dynamo` or `torch_compile` is a chosen backend)
4849
* `--onnx` : ONNX model file which helps bypass the step of exporting ONNX from `model_torch`. If this argument is provided, the ONNX will be directly converted to TRT engine
4950
* `--inputs` : List of input shapes & dtypes. Eg: (1, 3, 224, 224)@fp32 for Resnet or (1, 128)@int32;(1, 128)@int32 for BERT
@@ -60,16 +61,16 @@ Eg:
6061
```
6162
python perf_run.py --model ${MODELS_DIR}/vgg16_scripted.jit.pt \
6263
--model_torch ${MODELS_DIR}/vgg16_torch.pt \
63-
--precision fp32,fp16 --inputs="(1, 3, 224, 224)@fp32" \
64+
--precision fp32,fp16 \
65+
--inputs "(1, 3, 224, 224)@fp32" \
6466
--batch_size 1 \
65-
--backends torch,ts_trt,dynamo,torch_compile,tensorrt \
67+
--backends torch,ts_trt,dynamo,torch_compile,inductor,onnx_trt \
6668
--report "vgg_perf_bs1.txt"
6769
```
6870

6971
Note:
7072

7173
1. Please note that measuring INT8 performance is only supported via a `calibration cache` file or QAT mode for `torch_tensorrt` backend.
72-
2. TensorRT engine filename should end with `.plan` otherwise it will be treated as Torchscript module.
7374

7475
### Example models
7576

tools/perf/benchmark.sh

Lines changed: 36 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ python hub.py
77

88
batch_sizes=(1 2 4 8 16 32 64 128 256)
99
large_model_batch_sizes=(1 2 4 8 16 32 64)
10-
backends=("torch" "ts_trt" "dynamo" "torch_compile" "inductor" "tensorrt")
11-
backends_no_torchscript=("torch" "dynamo" "torch_compile" "inductor" "tensorrt")
10+
backends=("torch" "ts_trt" "dynamo" "torch_compile" "inductor" "onnx_trt")
11+
backends_no_torchscript=("torch" "dynamo" "torch_compile" "inductor" "onnx_trt")
1212

1313

1414
# Benchmark VGG16 model
@@ -107,18 +107,48 @@ do
107107
done
108108
done
109109

110-
# Benchmark Stable Diffusion UNet model
111-
echo "Benchmarking SD UNet model"
110+
# Benchmark Stable Diffusion v1.4 UNet model
111+
echo "Benchmarking SD-v1.4 UNet model"
112112
for bs in ${large_model_batch_sizes[@]}
113113
do
114114
for backend in ${backends_no_torchscript[@]}
115115
do
116-
python perf_run.py --model_torch sd_unet \
116+
python perf_run.py --model_torch sd1.4_unet \
117117
--precision fp16 --inputs="(${bs}, 4, 64, 64);(${bs});(${bs}, 1, 768)" \
118118
--batch_size ${bs} \
119119
--truncate \
120120
--backends ${backend} \
121-
--report "sd_unet_perf_bs${bs}_backend_${backend}.csv"
121+
--report "sd1.4_unet_perf_bs${bs}_backend_${backend}.csv"
122+
done
123+
done
124+
125+
# Benchmark Stable Diffusion v2.1 UNet model
126+
echo "Benchmarking SD-v2.1 UNet model"
127+
for bs in ${large_model_batch_sizes[@]}
128+
do
129+
for backend in ${backends_no_torchscript[@]}
130+
do
131+
python perf_run.py --model_torch sd2.1_unet \
132+
--precision fp16 --inputs="(${bs}, 4, 64, 64);(${bs});(${bs}, 1, 1024)" \
133+
--batch_size ${bs} \
134+
--truncate \
135+
--backends ${backend} \
136+
--report "sd2.1_unet_perf_bs${bs}_backend_${backend}.csv"
137+
done
138+
done
139+
140+
# Benchmark Stable Diffusion v2.1 VAE decoder model
141+
echo "Benchmarking SD-v2.1 VAE decoder model"
142+
for bs in ${large_model_batch_sizes[@]}
143+
do
144+
for backend in ${backends_no_torchscript[@]}
145+
do
146+
python perf_run.py --model_torch sd2.1_vae_decoder \
147+
--precision fp16 --inputs="(${bs}, 4, 64, 64)" \
148+
--batch_size ${bs} \
149+
--truncate \
150+
--backends ${backend} \
151+
--report "sd2.1_vae_decoder_perf_bs${bs}_backend_${backend}.csv"
122152
done
123153
done
124154

tools/perf/custom_models.py

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ def BertInputs():
2626
return [tokens_tensor, segments_tensors]
2727

2828

29-
def StableDiffusionUnet():
29+
def StableDiffusion1_4_Unet():
3030
from diffusers import DiffusionPipeline
3131

3232
pipe = DiffusionPipeline.from_pretrained(
@@ -35,7 +35,25 @@ def StableDiffusionUnet():
3535
return pipe.unet
3636

3737

38-
def UNet():
38+
def StableDiffusion2_1_Unet():
39+
from diffusers import StableDiffusionPipeline
40+
41+
pipe = StableDiffusionPipeline.from_pretrained(
42+
"stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16
43+
)
44+
return pipe.unet
45+
46+
47+
def StableDiffusion2_1_VaeDecoder():
48+
from diffusers import StableDiffusionPipeline
49+
50+
pipe = StableDiffusionPipeline.from_pretrained(
51+
"stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16
52+
)
53+
return pipe.vae.decoder
54+
55+
56+
def MonaiUNet():
3957
from monai.networks.nets import UNet
4058

4159
model = UNet(
@@ -46,4 +64,13 @@ def UNet():
4664
strides=(2, 2),
4765
num_res_units=2,
4866
)
49-
return model.eval().cuda()
67+
return model
68+
69+
70+
def GoogleViTForImageClassification():
71+
from transformers import ViTForImageClassification
72+
73+
model = ViTForImageClassification.from_pretrained(
74+
"google/vit-base-patch16-224", torch_dtype=torch.float16
75+
)
76+
return model

0 commit comments

Comments
 (0)