[Performance] Regarding the additional GPU memory space application in onnxtuntime for the custom CUDA operator deform conv2d. #2394
Unanswered
1193700079
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the issue
The deform_conv function is from the path under mmdeploy
csrc/mmdeploy/backend_ops/tensorrt/deform_conv/trt_deform_conv_kernel.cu
The code is as follows:
trt_deform_conv_kernel.cu
It involves the allocation of GPU memory for workspace, and I'm not very clear about how to use the workspace, so I simply use the byte size of an output tensor for storage.
I feel that this is not efficient enough, because tensorRT has corresponding APIs to apply for workspace memory. I want to know if there is a better way in onnxruntime, and I hope we can discuss it together! Please advise!
To reproduce
This code is written inside the Compute function.
the complete code is as follows:
‘’‘
Urgency
No response
Platform
Windows
OS Version
11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.15.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.8
Model File
No response
Is this a quantized model?
No
Beta Was this translation helpful? Give feedback.
All reactions