We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
flash_attn
1 parent 2117122 commit 2138561Copy full SHA for 2138561
llama_cpp/server/model.py
@@ -242,6 +242,7 @@ def load_llama_from_model_settings(settings: ModelSettings) -> llama_cpp.Llama:
242
logits_all=settings.logits_all,
243
embedding=settings.embedding,
244
offload_kqv=settings.offload_kqv,
245
+ flash_attn=settings.flash_attn,
246
# Sampling Params
247
last_n_tokens_size=settings.last_n_tokens_size,
248
# LoRA Params
0 commit comments