Where is description of model config file format? #7084

Expro · 2025-11-04T08:26:53Z

Expro
Nov 4, 2025

I want to enable flash attention KV quantization for one of models. Where is YAML format for model configuration described?

DevrMichael · 2025-11-04T08:49:44Z

DevrMichael
Nov 4, 2025

@Expro

The YAML model config format is documented in the docs under “Full config model file reference”: Advanced usage → Full config model file reference.
The source of truth is the code that defines the schema: core/config/model_config.go (see ModelConfig and embedded LLMConfig). That’s where all YAML keys come from.
For flash-attention and KV cache quantization, use:

flash_attention
cache_type_k
cache_type_v
no_kv_offloading

Example (llama.cpp backend):
name: my-llama
backend: llama.cppparameters:  
model: my-model.gguf
flash_attention: "fa2"      # value per your llama.cpp build
cache_type_k: "q8_0"
cache_type_v: "q8_0"
no_kv_offloading: false

Note: Accepted values for flash_attention and cache_type_* are the same as llama.cpp CLI flags; check the llama.cpp documentation for the exact strings supported by your build.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Where is description of model config file format? #7084

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Where is description of model config file format? #7084

Uh oh!

Expro Nov 4, 2025

Replies: 1 comment

Uh oh!

DevrMichael Nov 4, 2025

Expro
Nov 4, 2025

DevrMichael
Nov 4, 2025