Releases: EAddario/llama.cpp
Releases · EAddario/llama.cpp
b6519
chat : fix build on arm64 (#16101)
b6475
model : add grok-2 support (#15539) * add grok-2 support * type fix * type fix * type fix * "fix" vocab for invalid sequences * fix expert tensor mapping and spaces in vocab * add chat template * fix norm tensor mapping * rename layer_out_norm to ffn_post_norm * ensure ffn_post_norm is mapped * fix experts merging * remove erroneous FFN_GATE entry * concatenate split tensors and add more metadata * process all expert layers and try cat instead of hstack * add support for community BPE vocab * fix expert feed forward length and ffn_down concat * commit this too * add ffn_up/gate/down, unsure if sequence is right * add ffn_gate/down/up to tensor names * correct residual moe (still not working) * mess-- * fix embedding scale being applied twice * add built in chat template * change beta fast for grok if default value * remove spm vocab in favor of community bpe vocab * change attention temp length metadata type to integer * update attention temp length metadata * remove comment * replace M_SQRT2 with std::sqrt(2) * add yarn metadata, move defaults to hparams
b6445
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#…
b6399
server : implement prompt processing progress report in stream mode (…
b6323
vulkan: Skip syncing for prealloc_y when it is reused (#15544)
b6294
tests : fix test-opt with GGML_BACKEND_DL (#15599)
b6275
vulkan: fix min subgroup 16 condition for mmid subgroup optimization …
b6264
vulkan: apply MUL_MAT_ID subgroup optimization to non-coopmat devices…
b6239
llama : remove deprecated llama_kv_self API (#15472) ggml-ci
b6209
opencl: mark `argsort` unsupported if cols exceed workgroup limit (#1…