Releases · mofosyne/llama.cpp

03 Feb 10:30

d92cb67

b4622 Latest

Latest

server : (webui) Fix Shift+Enter handling (#11609)

* Fix Shift+Enter handling

`exact` on the Enter handler means the message is not sent when Shift+Enter is pressed anyway

* build index.html.gz

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-02-03T10:31:00Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-02-03T10:31:11Z
llama-b4622-bin-macos-arm64.zip

25.3 MB 2025-02-03T10:31:22Z
llama-b4622-bin-macos-x64.zip

27.1 MB 2025-02-03T10:31:24Z
llama-b4622-bin-ubuntu-x64.zip

29 MB 2025-02-03T10:31:25Z
llama-b4622-bin-win-avx-x64.zip

15.4 MB 2025-02-03T10:31:27Z
llama-b4622-bin-win-avx2-x64.zip

15.4 MB 2025-02-03T10:31:28Z
llama-b4622-bin-win-avx512-x64.zip

15.4 MB 2025-02-03T10:31:29Z
llama-b4622-bin-win-cuda-cu11.7-x64.zip

150 MB 2025-02-03T10:31:30Z
llama-b4622-bin-win-cuda-cu12.4-x64.zip

150 MB 2025-02-03T10:31:38Z
Source code (zip)

2025-02-03T09:42:55Z
Source code (tar.gz)

2025-02-03T09:42:55Z

10 Aug 11:54

github-actions

b3563

7c3f55c

b3563

Add support for encoder-only T5 models (#8900)

* gguf-py : add T5ENCODER model architecture

* common : call llama_decode() during warmup only if the model has decoder

* convert-hf : add T5EncoderModel

* llama : add llama_model_has_decoder() API function

* llama : split build_t5() into build_t5_encoder() and build_t5_decoder()

* llama : add support for LLM_ARCH_T5ENCODER

* llama-embedding : add support for LLAMA_POOLING_TYPE_NONE

* llama-embedding : add support for encoder-only models

---------

Co-authored-by: Stanisław Szymczyk <[email protected]>

Assets 20

05 Aug 15:26

github-actions

b3524

bc0f887

b3524

cann: fix buffer_num and runtime speed slowly error (#8865)

Assets 20

05 Aug 12:16

github-actions

b3521

1ef14b3

b3521

py: Add more authorship metadata from model card (#8810)

* py: add more authorship metadata from model card

* fixup! py: add more authorship metadata from model card

Assets 20

01 Aug 12:52

github-actions

b3499

c8a0090

b3499

cann: support q8_0 for Ascend backend (#8805)

Assets 20

30 Jul 15:00

github-actions

b3491

140074b

b3491

flake.lock: Update (#8729)

Assets 20

23 Jul 10:37

github-actions

b3445

751fcfc

b3445

Vulkan IQ4_NL Support (#8613)

* Fix Vulkan matmul tests compile errors

* Add Vulkan IQ4_NL support

* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

Assets 20

21 Jul 03:33

github-actions

b3431

22f281a

b3431

examples : Rewrite pydantic_models_to_grammar_examples.py (#8493)

Changes:

- Move each example into its own function. This makes the code much
  easier to read and understand.
- Make the program easy to only run one test by commenting out function
  calls in main().
- Make the output easy to parse by indenting the output for each example.
- Add shebang and +x bit to make it clear it's an executable.
- Make the host configurable via --host with a default 127.0.0.1:8080.
- Make the code look in the tools list to call the registered tool,
  instead of hardcoding the returned values. This makes the code more
  copy-pastable.
- Add error checking, so that the program exits 1 if the LLM didn't
  returned expected values. It's super useful to check for correctness.

Testing:

- Tested with Mistral-7B-Instruct-v0.3 in F16 and Q5_K_M and
  Meta-Llama-3-8B-Instruct in F16 and Q5_K_M.
  - I did not observe a failure even once in Mistral-7B-Instruct-v0.3.
  - Llama-3 failed about a third of the time in example_concurrent: it
    only returned one call instead of 3. Even for F16.

Potential follow ups:

- Do not fix the prompt encoding yet. Surprisingly it mostly works even
  if the prompt encoding is not model optimized.
- Add chained answer and response.

Test only change.

Assets 20

20 Jul 08:16

github-actions

b3424

c3776ca

b3424

gguf_dump.py: fix markddown kv array print (#8588)

* gguf_dump.py: fix markddown kv array print

* Update gguf-py/scripts/gguf_dump.py

Co-authored-by: compilade <[email protected]>

* gguf_dump.py: refactor kv array string handling

* gguf_dump.py: escape backticks inside of strings

* gguf_dump.py: inline code markdown escape handler added

>>> escape_markdown_inline_code("hello world")
'`hello world`'
>>> escape_markdown_inline_code("hello ` world")
'``hello ` world``'

* gguf_dump.py: handle edge case about backticks on start or end of a string

---------

Co-authored-by: compilade <[email protected]>

Assets 20

19 Jul 10:46

github-actions

b3418

f299aa9

b3418

fix: typo of chatglm4 chat tmpl (#8586)

Signed-off-by: thxCode <[email protected]>

Assets 20

Releases: mofosyne/llama.cpp

b4622

Uh oh!

b3563

Uh oh!

b3524

Uh oh!

b3521

Uh oh!

b3499

Uh oh!

b3491

Uh oh!

b3445

Uh oh!

b3431

Uh oh!

b3424

Uh oh!

b3418

Uh oh!