Releases · mingfeima/llama.cpp

08 Aug 04:13

15fa07a

b3542 Latest

Latest

make : use C compiler to build metal embed object (#8899)

* make : use C compiler to build metal embed object

* use rm + rmdir to avoid -r flag in rm

Assets 20

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-08-08T04:13:39Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-08-08T04:13:44Z
llama-b3542-bin-macos-arm64.zip

45.6 MB 2024-08-08T04:13:52Z
llama-b3542-bin-macos-x64.zip

47 MB 2024-08-08T04:13:54Z
llama-b3542-bin-ubuntu-x64.zip

50.8 MB 2024-08-08T04:13:56Z
llama-b3542-bin-win-avx-x64.zip

7.26 MB 2024-08-08T04:13:57Z
llama-b3542-bin-win-avx2-x64.zip

7.25 MB 2024-08-08T04:13:57Z
llama-b3542-bin-win-avx512-x64.zip

7.26 MB 2024-08-08T04:13:58Z
llama-b3542-bin-win-cuda-cu11.7.1-x64.zip

124 MB 2024-08-08T04:13:59Z
llama-b3542-bin-win-cuda-cu12.2.0-x64.zip

123 MB 2024-08-08T04:14:02Z
Source code (zip)

2024-08-07T16:24:05Z
Source code (tar.gz)

2024-08-07T16:24:05Z

23 Jul 01:56

github-actions

b3441

081fe43

b3441

llama : fix codeshell support (#8599)

* llama : fix codeshell support

* llama : move codeshell after smollm below to respect the enum order

Assets 20

16 Jul 06:10

github-actions

b3401

7acfd4e

b3401

convert_hf : faster lazy safetensors (#8482)

* convert_hf : faster lazy safetensors

This makes '--dry-run' much, much faster.

* convert_hf : fix memory leak in lazy MoE conversion

The '_lazy' queue was sometimes self-referential,
which caused reference cycles of objects old enough
to avoid garbage collection until potential memory exhaustion.

Assets 20

23 May 02:58

github-actions

b2972

cd93a28

b2972

CUDA: fix FA out-of-bounds reads (#7479)

Assets 21

17 Apr 06:40

github-actions

b2688

facb8b5

b2688

convert : fix autoawq gemma (#6704)

* fix autoawq quantized gemma model convert error

using autoawq to quantize gemma model will include a lm_head.weight tensor in model-00001-of-00002.safetensors. it result in this situation that convert-hf-to-gguf.py can't map lm_head.weight. skip loading this tensor could prevent this error.

* change code to full string match and print necessary message

change code to full string match and print a short message to inform users that lm_head.weight has been skipped.

---------

Co-authored-by: Zheng.Deng <[email protected]>

Assets 18

03 Apr 04:50

github-actions

b2586

5260486

b2586

[SYCL] Disable iqx on windows as WA (#6435)

* disable iqx on windows as WA

* array instead of global_memory

Assets 18

27 Mar 03:24

github-actions

b2542

a4f569e

b2542

[SYCL] fix no file in win rel (#6314)

Assets 18

Releases: mingfeima/llama.cpp

b3542

Uh oh!

b3441

Uh oh!

b3401

Uh oh!

b2972

Uh oh!

b2688

Uh oh!

b2586

Uh oh!

b2542

Uh oh!