Replies: 1 comment
-
Closing this discussion and reposting as an issue: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Dear community,
I am using direct-ml for inference of UNet models trained with PyTorch. UNets are mostly Conv3D + BatchNormalization + Relu operations. No transformers used.
The inference results are great and I am looking for model optimization for faster inference.
I hoped that converting the model weights to float16 would be about twice as fast, however I took just as long or sometimes 5-10% longer than float32.
With the same models and cuda execution provider I got halve the inference time, as expected. However I like the portability from directml.
I export models as:
I tried the following:
Expected behaviour
I would expect half the inference time, since using the same platform and GPU I can get so with the cuda provider.
Are there any other options that I could try?
platform: Windows 11
python=3.11.9
Onnx=1.16
Onnxruntime=1.17 / 1.20
GPU: NVIDIA RTX 2080 8gb VRAM.
Beta Was this translation helpful? Give feedback.
All reactions