-
Notifications
You must be signed in to change notification settings - Fork 218
Unload model from GPU #362
Copy link
Copy link
Open
Description
I'm trying to use different models with LMQL, but it seems that each new model is loaded onto the GPU. Is it possible to unload a model before loading a new one? I've searched through the code but haven't been able to figure out how to unload a model.
Here is the code I use to load a model :
self._llm = lmql.model(
f"local:llama.cpp:{model.get_model_absolute_path()}",
tokenizer=model.tokenizer,
n_gpu_layers=-1,
n_ctx=4096,
)
I found this issue #228 but it refers to loading model using the cli "lmql serve-model"
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels