Releases: mofosyne/llama.cpp
Releases · mofosyne/llama.cpp
b2998
train : change default FA argument (#7528)
b2989
docker.yml: disable light-intel and server-intel test (#7515) * docker.yml: disable light-intel test * docker.yml: disable server-intel test
b2988
Add support for ArcticForCausalLM (#7020) * common : increase max number of experts to 128 * common : add tensor LLM_TENSOR_FFN_NORM_EXPS for normalization before MoE that runs in parallel to attention + ffn * gguf-py : add architecture-specific block mappings that override selected general block mappings * convert-hf : add model conversion support for ArcticForCausalLM * convert-hf : use added_tokens_decoder from tokenizer_config.json to redefine tokens from SentencePiece model (only for ArcticForCausalLM) * llama : add inference support for LLM_ARCH_ARCTIC --------- Co-authored-by: Stanisław Szymczyk <[email protected]>
b2987
add build shared lib in win release package (#7438)
b2979
Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-…
b2963
CUDA: remove incorrect precision check (#7454)
b2941
Add provisions for windows support for BF16 code including CMake prov…
b2930
cmake : update android comments (#7341)
b2918
ggml : fix quants nans when all the group weights are very close to z…
b2903
Revert "server bench: fix bench not waiting for model load (#7284)" (…