-
Notifications
You must be signed in to change notification settings - Fork 109
Description
when loading a new instance of the model with lms the gpu parameter is still an issue.
model = client.llm.load_new_instance(model_name,config={
"temperature": temperature,
"contextLength": min(tmp_max_context_length,max_tokens),
"gpu": {"ratio": model_gpuol, }
})
Model info: LlmInstanceInfo.from_dict({
"architecture": "qwen3",
"contextLength": 32768,
"displayName": "Qwen3 8B",
"format": "gguf",
"identifier": "qwen3-8b",
"instanceReference": "OdO6BqpmD+eu5898DBjJrOnA",
"maxContextLength": 32768,
"modelKey": "qwen3-8b",
"paramsString": "8B",
"path": "lmstudio-community/Qwen3-8B-GGUF/Qwen3-8B-Q8_0.gguf",
"sizeBytes": 8709518624,
"trainedForToolUse": true,
"type": "llm",
"vision": false
})
Found max context length: 32768
gpuoffload calculated: 1.0
ERROR:main:Error loading model 'qwen3-8b': 'mainGpu'