Replies: 2 comments
-
Not currently. Threadpools are created when the inference session is created and not on a per-Run basis. If you need to use multiple sessions to control the number of threads you could mitigate the memory usage cost by using shared initializers (one copy of weights for all sessions) and a shared allocator (avoid separate memory arena in each session). See the relevant sections for each in https://onnxruntime.ai/docs/get-started/with-c.html |
Beta Was this translation helpful? Give feedback.
-
ok thank you, but, shared initializers usage is not easy so far. would it be enough to use external data to create the model, i.e call AddExternalInitializersFromFilesInMemory(), then call the CreateSessionWithPrepackedWeightsContainer() and pass the prepack pointer to all session? update: i tried the AddExternalInitializersFromFilesInMemory but no luck, |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello
I have one session shared between multiple threads;, each thread calling run().
When creating the session , we can specify intra op threads. Or leaves 0 to use cpu cores. This number is global to ALL inferences.
Is there a way to limit the number of threads used per run() ?
(i.e i want each of my run() use a fraction of the intraop thread number)
The reason behind is that i see some perf degradation when performing N concurrent run(), compared to doing the same number of N concurrent calls with a per run() thread limit (i achieved that duplicating N session, and setting the intra op to cores/N ... which is a waste of memory as model get loaded N times)
thanks & bravo to the team!
Update with numbers:
if i call 4 concurrent run(), on a shared session with intraop = Cores (=24), i got RTF of 0.055
if i call 4 concurrent run(), on 4 sessions with intraop = 6, i got RTF = 0.045
Beta Was this translation helpful? Give feedback.
All reactions