Skip to content

Releases: mofosyne/llama.cpp

b4622

03 Feb 10:30
d92cb67

Choose a tag to compare

server : (webui) Fix Shift+Enter handling (#11609)

* Fix Shift+Enter handling

`exact` on the Enter handler means the message is not sent when Shift+Enter is pressed anyway

* build index.html.gz

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

b3563

10 Aug 11:54
7c3f55c

Choose a tag to compare

Add support for encoder-only T5 models (#8900)

* gguf-py : add T5ENCODER model architecture

* common : call llama_decode() during warmup only if the model has decoder

* convert-hf : add T5EncoderModel

* llama : add llama_model_has_decoder() API function

* llama : split build_t5() into build_t5_encoder() and build_t5_decoder()

* llama : add support for LLM_ARCH_T5ENCODER

* llama-embedding : add support for LLAMA_POOLING_TYPE_NONE

* llama-embedding : add support for encoder-only models

---------

Co-authored-by: Stanisław Szymczyk <[email protected]>

b3524

05 Aug 15:26
bc0f887

Choose a tag to compare

cann: fix buffer_num and runtime speed slowly error (#8865)

b3521

05 Aug 12:16
1ef14b3

Choose a tag to compare

py: Add more authorship metadata from model card (#8810)

* py: add more authorship metadata from model card

* fixup! py: add more authorship metadata from model card

b3499

01 Aug 12:52
c8a0090

Choose a tag to compare

cann: support q8_0 for Ascend backend (#8805)

b3491

30 Jul 15:00
140074b

Choose a tag to compare

flake.lock: Update (#8729)

b3445

23 Jul 10:37
751fcfc

Choose a tag to compare

Vulkan IQ4_NL Support (#8613)

* Fix Vulkan matmul tests compile errors

* Add Vulkan IQ4_NL support

* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

b3431

21 Jul 03:33
22f281a

Choose a tag to compare

examples : Rewrite pydantic_models_to_grammar_examples.py (#8493)

Changes:

- Move each example into its own function. This makes the code much
  easier to read and understand.
- Make the program easy to only run one test by commenting out function
  calls in main().
- Make the output easy to parse by indenting the output for each example.
- Add shebang and +x bit to make it clear it's an executable.
- Make the host configurable via --host with a default 127.0.0.1:8080.
- Make the code look in the tools list to call the registered tool,
  instead of hardcoding the returned values. This makes the code more
  copy-pastable.
- Add error checking, so that the program exits 1 if the LLM didn't
  returned expected values. It's super useful to check for correctness.

Testing:

- Tested with Mistral-7B-Instruct-v0.3 in F16 and Q5_K_M and
  Meta-Llama-3-8B-Instruct in F16 and Q5_K_M.
  - I did not observe a failure even once in Mistral-7B-Instruct-v0.3.
  - Llama-3 failed about a third of the time in example_concurrent: it
    only returned one call instead of 3. Even for F16.

Potential follow ups:

- Do not fix the prompt encoding yet. Surprisingly it mostly works even
  if the prompt encoding is not model optimized.
- Add chained answer and response.

Test only change.

b3424

20 Jul 08:16
c3776ca

Choose a tag to compare

gguf_dump.py: fix markddown kv array print (#8588)

* gguf_dump.py: fix markddown kv array print

* Update gguf-py/scripts/gguf_dump.py

Co-authored-by: compilade <[email protected]>

* gguf_dump.py: refactor kv array string handling

* gguf_dump.py: escape backticks inside of strings

* gguf_dump.py: inline code markdown escape handler added

>>> escape_markdown_inline_code("hello world")
'`hello world`'
>>> escape_markdown_inline_code("hello ` world")
'``hello ` world``'

* gguf_dump.py: handle edge case about backticks on start or end of a string

---------

Co-authored-by: compilade <[email protected]>

b3418

19 Jul 10:46
f299aa9

Choose a tag to compare

fix: typo of chatglm4 chat tmpl (#8586)

Signed-off-by: thxCode <[email protected]>