Performance problem for model with multiple heads #8192
Unanswered
gorodnitskiy
asked this question in
Other Q&A
Replies: 1 comment
-
It might be useful to read this as Convolutions attempt to use threads heavily. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, guys!
I found weird case with onnx runtime performance:
I trained resnext50_32x4d with 3 heads (head = conv2d(..., groups=64)-relu-linear) via pytorch-lightnings. Model code is below. Then I exported model to onnx via torch.onnx.export with opset_version=13 (using code from example) and ran by onnxruntime on CPU. I got approximately 225 ms per image in a single-thread mode.
But if I round-off manually model weights with 16 digits after decimal point (using load_model(…, n_digits=16) func) before exporting to onnx, I got 75 ms per image - 3x speed-up. I checked output tensors with 16-digits precision and it completely matched. And I checked its rounding did not change the model weights.
Could someone explain it? Any suggestions?
I suppose this case related to PyTorch onnx exporting, but I'll leave it here, in case it will be useful to someone.
Versions:
Hardware: Intel (R) Core(TM) i5-10600K CPU @ 4.10GHz
Single-thread mode CPU:
Beta Was this translation helpful? Give feedback.
All reactions