Releases · ReinForce-II/llama.cpp

23 May 04:03

cd93a28

b2972 Latest

Latest

CUDA: fix FA out-of-bounds reads (#7479)

Assets 21

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-05-23T04:03:07Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-05-23T04:03:13Z
llama-b2972-bin-macos-arm64.zip

41.6 MB 2024-05-23T04:03:21Z
llama-b2972-bin-macos-x64.zip

38.2 MB 2024-05-23T04:03:22Z
llama-b2972-bin-ubuntu-x64.zip

46.3 MB 2024-05-23T04:03:24Z
llama-b2972-bin-win-avx-x64.zip

6.66 MB 2024-05-23T04:03:25Z
llama-b2972-bin-win-avx2-x64.zip

6.63 MB 2024-05-23T04:03:26Z
llama-b2972-bin-win-avx512-x64.zip

6.66 MB 2024-05-23T04:03:27Z
llama-b2972-bin-win-clblast-x64.zip

7.83 MB 2024-05-23T04:03:28Z
llama-b2972-bin-win-cuda-cu11.7.1-x64.zip

65.1 MB 2024-05-23T04:03:28Z
Source code (zip)

2024-05-22T22:31:20Z
Source code (tar.gz)

2024-05-22T22:31:20Z

22 May 02:40

github-actions

b2961

201cc11

b2961

llama : add phi3 128K model support (#7225)

* add phi3 128k support in convert-hf-to-gguf

* add phi3 128k support in cuda

* address build warnings on llama.cpp

* adjust index value in cuda long rope freq factors

* add long rope support in ggml cpu backend

* make freq factors only depend on ctx size

* remove unused rope scaling type 'su' frin gguf converter

* fix flint warnings on convert-hf-to-gguf.py

* set to the short freq factor when context size is small than trained context size

* add one line of comments

* metal : support rope freq_factors

* ggml : update ggml_rope_ext API to support freq. factors

* backends : add dev messages to support rope freq. factors

* minor : style

* tests : update to use new rope API

* backends : fix pragma semicolons

* minor : cleanup

* llama : move rope factors from KV header to tensors

* llama : remove tmp assert

* cuda : fix compile warning

* convert : read/write n_head_kv

* llama : fix uninitialized tensors

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 21

21 May 09:07

github-actions

b2953

917dc8c

b2953

Tokenizer SPM fixes for phi-3 and llama-spm (#7375)

* Update brute force test: special tokens
* Fix added tokens
  - Try to read 'added_tokens.json'.
  - Try to read 'tokenizer_config.json'.
  - Try to read 'tokenizer.json'.
* Fix special tokens rtrim

Co-authored-by: Georgi Gerganov <[email protected]>
* server : fix test regexes

Assets 21

20 May 15:54

github-actions

b2950

db10f01

b2950

rpc : track allocated buffers (#7411)

* rpc : track allocated buffers

ref: #7407

* rpc : pack rpc_tensor tightly

Assets 21

20 May 03:37

github-actions

b2941

33c8d50

b2941

Add provisions for windows support for BF16 code including CMake prov…

Assets 21

14 May 11:42

github-actions

b2876

5416002

b2876

llama : disable pipeline parallelism with nkvo (#7265)

Assets 19

10 May 09:18

github-actions

b2837

d11afd6

b2837

llava : fix moondream support (#7163)

* Revert "Revert "llava : add support for moondream vision language model (#6899)""

This reverts commit 9da243b36ac0b9d609adfaaa4c8f1cc8c592f737.

* Fix num_positions and embeddings initialization

Assets 19

Releases: ReinForce-II/llama.cpp

b2972

Uh oh!

b2961

Uh oh!

b2953

Uh oh!

b2950

Uh oh!

b2941

Uh oh!

b2876

Uh oh!

b2837

Uh oh!