Skip to content

Releases: mofosyne/llama.cpp

b2998

25 May 13:11
9588f19

Choose a tag to compare

train : change default FA argument (#7528)

b2989

24 May 14:16
27891f6

Choose a tag to compare

docker.yml: disable light-intel and server-intel test (#7515)

* docker.yml: disable light-intel test

* docker.yml: disable server-intel test

b2988

24 May 13:44
fbca2f2

Choose a tag to compare

Add support for ArcticForCausalLM (#7020)

* common : increase max number of experts to 128

* common : add tensor LLM_TENSOR_FFN_NORM_EXPS for normalization before MoE that runs in parallel to attention + ffn

* gguf-py : add architecture-specific block mappings that override selected general block mappings

* convert-hf : add model conversion support for ArcticForCausalLM

* convert-hf : use added_tokens_decoder from tokenizer_config.json to redefine tokens from SentencePiece model (only for ArcticForCausalLM)

* llama : add inference support for LLM_ARCH_ARCTIC

---------

Co-authored-by: Stanisław Szymczyk <[email protected]>

b2987

24 May 10:30
0df0aa8

Choose a tag to compare

add build shared lib in win release package (#7438)

b2979

23 May 11:54
9b82476

Choose a tag to compare

Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-…

b2963

22 May 09:10
95fb0ae

Choose a tag to compare

CUDA: remove incorrect precision check (#7454)

b2941

20 May 04:18
33c8d50

Choose a tag to compare

Add provisions for windows support for BF16 code including CMake prov…

b2930

19 May 10:22
854d365

Choose a tag to compare

cmake : update android comments (#7341)

b2918

18 May 05:50
0583484

Choose a tag to compare

ggml : fix quants nans when all the group weights are very close to z…

b2903

17 May 00:24
24ecb58

Choose a tag to compare

Revert "server bench: fix bench not waiting for model load (#7284)" (…