What is optimal way to load model multiple time? #9669
Unanswered
prateek9623
asked this question in
Other Q&A
Replies: 1 comment 1 reply
-
It's unclear to me the metric you want to optimize toward |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a use case where I want to load the model 8 times(4 GPUs, 2 streams). What would be the optimal way to load the model. Until now I was using CUDNN based custom framework, where I can create graphs and load weights in the CPU initially and when required load weights in GPU. Is there anything similar in ORT? This is for inference only, for data size around [20000, 512, 512, 3].
Beta Was this translation helpful? Give feedback.
All reactions