Replies: 1 comment
-
same question. Multi is so slow now, still unsolved. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I'm using cpp onnxruntime-gpu to inference model. I defined a class include image input, onnx inference and get output results(e.g. class Model), and now i want to create multi Model handle ,like model1 = Model(), model2 = Model()... Now the problem is, the onnx session is created in the Model instance, when i try to create multi handle on one same GPU to use all gpu RAM and calculate core so i can inference images at same time to speed up throughputs, but it seems as slow as i use one handle, and inference them one by one. Is there any suggestions to make use of one gpu's all resources? thanks
Beta Was this translation helpful? Give feedback.
All reactions