Limit number of thread per "run()" (inference) #25199

SpitchAG · 2025-06-27T15:13:05Z

SpitchAG
Jun 27, 2025

Hello
I have one session shared between multiple threads;, each thread calling run().

When creating the session , we can specify intra op threads. Or leaves 0 to use cpu cores. This number is global to ALL inferences.

Is there a way to limit the number of threads used per run() ?
(i.e i want each of my run() use a fraction of the intraop thread number)

The reason behind is that i see some perf degradation when performing N concurrent run(), compared to doing the same number of N concurrent calls with a per run() thread limit (i achieved that duplicating N session, and setting the intra op to cores/N ... which is a waste of memory as model get loaded N times)

thanks & bravo to the team!

Update with numbers:
if i call 4 concurrent run(), on a shared session with intraop = Cores (=24), i got RTF of 0.055
if i call 4 concurrent run(), on 4 sessions with intraop = 6, i got RTF = 0.045

skottmckay · 2025-06-30T01:08:30Z

skottmckay
Jun 30, 2025
Maintainer

Not currently.

Threadpools are created when the inference session is created and not on a per-Run basis.

If you need to use multiple sessions to control the number of threads you could mitigate the memory usage cost by using shared initializers (one copy of weights for all sessions) and a shared allocator (avoid separate memory arena in each session).

See the relevant sections for each in https://onnxruntime.ai/docs/get-started/with-c.html

0 replies

SpitchAG · 2025-06-30T17:07:30Z

SpitchAG
Jun 30, 2025
Author

ok thank you,

but, shared initializers usage is not easy so far.
I mean, Do I have to export all initializers, know name and tensor shape, and add them one by one to the session using AddInitializer() ?
Is there a simpler way of sharing this session model weights?

would it be enough to use external data to create the model, i.e call AddExternalInitializersFromFilesInMemory(), then call the CreateSessionWithPrepackedWeightsContainer() and pass the prepack pointer to all session?

update: i tried the AddExternalInitializersFromFilesInMemory but no luck,

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Limit number of thread per "run()" (inference) #25199

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Limit number of thread per "run()" (inference) #25199

Uh oh!

Uh oh!

SpitchAG Jun 27, 2025

Replies: 2 comments

Uh oh!

skottmckay Jun 30, 2025 Maintainer

Uh oh!

Uh oh!

SpitchAG Jun 30, 2025 Author

SpitchAG
Jun 27, 2025

skottmckay
Jun 30, 2025
Maintainer

SpitchAG
Jun 30, 2025
Author