If I set a config dict like:
config = {
"contextLength": 131072,
"offloadKVCacheToGpu": True,
"llamaKCacheQuantizationType": "q8_0",
"llamaVCacheQuantizationType": "q8_0",
"flashAttention": True,
"gpu": {
"disabledGpus": [],
"ratio": 1.0
},
"gpuStrictVramCap": True,
"tryMmap": True
}
and pass the config like:
import lmstudio as lms
with lms.Client() as client:
client.llm.model("model_key", config=config)
LM Studio will not set kv cache quantization types. It respects other changes like flashAttention and contextLength so I don't think I'm setting the config dict wrong.