Cannot set kv cache quantization type when loading a model.

If I set a config dict like:

```python
config = {
  "contextLength": 131072,
  "offloadKVCacheToGpu": True,
  "llamaKCacheQuantizationType": "q8_0",
  "llamaVCacheQuantizationType": "q8_0",
  "flashAttention": True,
  "gpu": {
    "disabledGpus": [],
    "ratio": 1.0
  },
  "gpuStrictVramCap": True,

  "tryMmap": True
}
```

and pass the config like:
```python
import lmstudio as lms

with lms.Client() as client:
    client.llm.model("model_key", config=config)
```

LM Studio will not set kv cache quantization types. It respects other changes like flashAttention and contextLength so I don't think I'm setting the config dict wrong.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot set kv cache quantization type when loading a model. #107

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot set kv cache quantization type when loading a model. #107

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions