How to use self-hosted models by API in a way that lets me swap models? #8431
Unanswered
jerkstorecaller
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Could use some advice here.
Here's my setup:
What can I do to achieve this:
a. If Model2 is what's already loaded on GGUF Server, returns a quick response
b. If a different model was loaded, then the previous model is unloaded, and the GGUF Server loads Model2 before answering Librechat (since only one model fits in RAM). Next request to this model goes through step A so it's fast.
So I need knowledge of some other tool than Librechat that's able to do model hotswap based on what Librechat selects.
Is this doable with existing tools?
Beta Was this translation helpful? Give feedback.
All reactions