Unload model from GPU

I'm trying to use different models with LMQL, but it seems that each new model is loaded onto the GPU. Is it possible to unload a model before loading a new one? I've searched through the code but haven't been able to figure out how to unload a model.
Here is the code I use to load a model : 
```
self._llm = lmql.model(
                    f"local:llama.cpp:{model.get_model_absolute_path()}",
                    tokenizer=model.tokenizer,
                    n_gpu_layers=-1,
                    n_ctx=4096,
                )
```
I found this issue https://github.com/eth-sri/lmql/issues/228 but it refers to loading model using the cli "lmql serve-model"



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unload model from GPU #362

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unload model from GPU #362

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions