Replies: 1 comment
-
Note: Accepted values for flash_attention and cache_type_* are the same as llama.cpp CLI flags; check the llama.cpp documentation for the exact strings supported by your build. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I want to enable flash attention KV quantization for one of models. Where is YAML format for model configuration described?
Beta Was this translation helpful? Give feedback.
All reactions