Releases · ngxson/llama.cpp

30 Aug 02:38

ef47691

b6322

CANN: FIx compiler warnings (#15661)

Signed-off-by: noemotiovon <[email protected]>

Assets 15

29 Aug 13:52

github-actions

b6318

8101786

b6318

CUDA: fix bug in rms_norm fusion (#15660)

* CUDA: fix bug in rms_norm fusion

* Fix bug for OP_REPEAT

* Fix index for add

Assets 15

29 Aug 13:15

github-actions

b6317

60e5eee

b6317

chat : Seed OSS thinking + tool call support (#15552)

* Reasoning and tool-calling support for Seed OSS

* Fix grammar and partial parsing

* Whitespace

* New chat template

* Update common/chat.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update common/chat.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Remove unused 'purge_healing_marker' helper

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

29 Aug 01:07

github-actions

b6315

e8d99dd

b6315

nvidia nemotron nano v2 (nemotronh) (#15507)

* feat: Add NEMOTRONH to python arch enum

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Add NEMOTRONH to c++ arch enum

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Add NEMOTRONH to llama-arch layer map

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: First pass at conversion for nemotronh

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Add a verbose log for each tensor loaded

This is really helpful for diagnosing mismatches between the expected and
received tensors

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: First (broken) pass at nemotronh model architecture

It generates tokens, just not valid ones!

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Explicitly enable add_bos_token during conversion

The `tokenizer.json`/`tokenizer_config.json` in the model are a bit
contradictory. In the config, add_bos_token is set to False, but the
tokenizer model itself has a post_processor that adds the BOS token via
type: TemplateProcessing

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Use relu2 (LLM_FFN_RELU_SQR) for activation in FFN layers

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Only allocate attention cache for attention layers (not non-recurrent)

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Move residual add to after every block

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Use the correct norm tensor for the MLP blocks

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <[email protected]>

* Nemotron-H: MLP gate cleanup (pass NULL for unused gate)

This model does not use a gate in MLP blocks; pass NULLs for gate tensors to make intent clear and avoid unused-pointer noise.

* SSM: respect ssm_dt_rank for dt_dim when provided

Use GGUF-provided time_step_rank (ssm_dt_rank) to set dt_dim when > 0; fallback to max(64, n_embd/16).

* fix: plamo2 - revert dt_dim to default (remove ssm_dt_rank usage)

* Rename nemotronh to nemotron_h for consistency

- Update architecture name from NEMOTRONH to NEMOTRON_H in constants.py
- Change architecture string from 'nemotronh' to 'nemotron_h' in all files
- Update enum LLM_ARCH_NEMOTRONH to LLM_ARCH_NEMOTRON_H
- Update class name llm_build_nemotronh to llm_build_nemotron_h
- Consistent naming with underscore convention (nemotron_h vs nemotronh)

* feat: Support conversion for older NemotronH models

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <[email protected]>

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Maicon Domingues <[email protected]>
Co-authored-by: weatherman <[email protected]>

Assets 15

28 Aug 20:54

github-actions

b6314

a8bca68

b6314

fix: Compute the full sum in llama-eval-callback, not just the sum of…

Assets 15

28 Aug 18:56

github-actions

b6313

c97dc09

b6313

CUDA: add conv2d (#15635)

* CUDA: add conv2d

* CUDA: conv2d - correct formatting and added const

Assets 15

28 Aug 15:15

github-actions

b6312

6c442f4

b6312

ggml-cpu: fix invalid hsum build in debug s390x (#15634)

Signed-off-by: Aaron Teo <[email protected]>

Assets 15

28 Aug 14:53

github-actions

b6311

7380414

b6311

ggml : fix SSM_SCAN for n_groups > 1 (#15625)

Assets 15

28 Aug 14:25

github-actions

b6310

c8d0d14

b6310

kv-cache : fix find_slot to not search for continuous slot (#15638)

ggml-ci

Assets 15

28 Aug 09:49

github-actions

b6307

8a4280c

b6307

kv-cache : remove LLAMA_SET_ROWS checks (#15505)

ggml-ci

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b6322

Uh oh!

b6318

Uh oh!

b6317

Uh oh!

b6315

Uh oh!

b6314

Uh oh!

b6313

Uh oh!

b6312

Uh oh!

b6311

Uh oh!

b6310

Uh oh!

b6307

Uh oh!