Releases: gabriellarson/llama.cpp
Releases · gabriellarson/llama.cpp
b6475
model : add grok-2 support (#15539) * add grok-2 support * type fix * type fix * type fix * "fix" vocab for invalid sequences * fix expert tensor mapping and spaces in vocab * add chat template * fix norm tensor mapping * rename layer_out_norm to ffn_post_norm * ensure ffn_post_norm is mapped * fix experts merging * remove erroneous FFN_GATE entry * concatenate split tensors and add more metadata * process all expert layers and try cat instead of hstack * add support for community BPE vocab * fix expert feed forward length and ffn_down concat * commit this too * add ffn_up/gate/down, unsure if sequence is right * add ffn_gate/down/up to tensor names * correct residual moe (still not working) * mess-- * fix embedding scale being applied twice * add built in chat template * change beta fast for grok if default value * remove spm vocab in favor of community bpe vocab * change attention temp length metadata type to integer * update attention temp length metadata * remove comment * replace M_SQRT2 with std::sqrt(2) * add yarn metadata, move defaults to hparams
b6090
context : fix index overflow on huge outputs (#15080) * context : fix overflow when re-ordering huge outputs * context : fix logits size overflow for huge batches
b6082
vulkan: fix build when using glslang that does not support coopmat2 (…
b6077
model : add text-only support for Kimi-VL (and find special tokens in…
b6075
opencl: fix adreno compiler detection logic (#15029)
b6001
vulkan: skip empty set_rows to avoid invalid API usage (#14860)
b5998
Docs: add instructions for adding backends (#14889)
b5994
ggml-cpu : disable GGML_NNPA by default due to instability (#14880) * docs: update s390x document for sentencepiece Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit e086c5e3a7ab3463d8e0906efcfa39352db0a48d) * docs: update huggingface links + reword Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 8410b085ea8c46e22be38266147a1e94757ef108) * ggml-cpu: disable ggml-nnpa compile flag by default fixes #14877 Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 412f4c7c88894b8f55846b4719c76892a23cfe09) * docs: update s390x build docs to reflect nnpa disable Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit c1eeae1d0c2edc74ab9fbeff2707b0d357cf0b4d) --------- Signed-off-by: Aaron Teo <[email protected]>
b5902
model : add Kimi-K2 support (#14654) * Kimi-K2 conversion * add Kimi_K2 pre type * Kimi-K2 * Kimi-K2 unicode * Kimi-K2 * LLAMA_MAX_EXPERTS 384 * fix vocab iteration * regex space fix * add kimi-k2 to pre_computed_hashes * Updated with kimi-k2 get_vocab_base_pre hash * fix whitespaces * fix flake errors * remove more unicode.cpp whitespaces * change set_vocab() flow * add moonshotai-Kimi-K2.jinja to /models/templates/ * update moonshotai-Kimi-K2.jinja * add kimi-k2 chat template * add kimi-k2 * update NotImplementedError Co-authored-by: Sigbjørn Skjæret <[email protected]> * except Exception Co-authored-by: Sigbjørn Skjæret <[email protected]> * LLM_CHAT_TEMPLATE_KIMI_K2 if(add_ass){} --------- Co-authored-by: Sigbjørn Skjæret <[email protected]>
b5884
tests : cover lfm2 cases in test_ssm_conv (#14651)