Replies: 1 comment 2 replies
-
We don't track memory usage at a session level for inferencing builds. Is this for Triton? I'll add this as a feature request. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Our application requires serving multiple models, in cases we can't load all models into memory, we would want to load as much as possible and later on unload / load model on demand. For this case, it would be much helpful if we can collect GPU memory usage of each ORT session created, to make better decision on sessions to be destroyed. I wonder if ORT keeps track of the memory usage at session level and what are the APIs that it exposes.
Beta Was this translation helpful? Give feedback.
All reactions