Skip to content

Conversation

@jan-service-account
Copy link

Updates dev branch with latest release (b6445) from ggml-org/llama.cpp

danbev and others added 13 commits September 10, 2025 05:23
)

This commit adds caching of the ROCm installation for the windows-latest-cmake-hip job. 

The motivation for this is that the installation can sometimes hang and/or not complete properly leaving an invalid installation which later fails the build. By caching the installation hopefully we can keep a good installation available in the cache and avoid the installation step.

Refs: ggml-org#15365
…ggml-org#15893)

This commit adds check for two function pointers returned from
ggml_backend_reg_get_proc_address.

The motivation for this is that the function pointer could be nullptr if
the get proc address function changes in the future. This is also
consistent with all the other calls to ggml_backend_reg_get_proc_address
in the code base.
* CANN: implement LRU cache for ACL graphs in CANN backend

- Introduce ggml_cann_graph_lru_cache to store multiple ggml_cann_graph objects.
- Graphs are loaded on demand and evicted using LRU policy when capacity is exceeded.
- Updated push, move_to_front, and clear methods to manage cached graphs efficiently.
- Ensures reuse of graphs, reducing graph reconstruction overhead in CANN backend.

* fix typo

* The LRU cache capacity can be configured via an env variable

Signed-off-by: noemotiovon <[email protected]>

* refactory acl graph

* refactory && fix review comments

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>
* CANN: Add ROPE sin/cos cache for reuse

Introduce sin/cos caching mechanism in ROPE to avoid redundant
computation across layers. The cache is built on the first layer
per device and reused by subsequent layers if parameters match.

- Added sin_cache / cos_cache pointers and position_length tracking
- Introduced cache validity flags and properties:
  (ext_factor, theta_scale, freq_scale, attn_factor, is_neox)
- Accelerates ROPE by eliminating repeated sin/cos generation

This change reduces overhead in multi-layer scenarios while
preserving correctness by verifying parameter consistency.

Co-authored-by: hipudding <[email protected]>

* fix typo

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>
Co-authored-by: hipudding <[email protected]>
* tests : filter out no-ops from coverage report

This commit is a follow-up commit for ggml-org#15745 to address the feedback on
how no-op operations should be filtered out from the coverage report.

The feedback regarding the UNARY and GLU sub-operations not being
handled I not exactly sure what should be done. They are included in the
coverage, for example ABS, ELU, EXP, GELU, GEGLU, GEGLU_ERF etc are in
the list of covered operations:
```console
$ ./build/bin/test-backend-ops --show-coverage
Operations covered by tests (89):
  ✓ ABS
  ✓ ACC
  ✓ ADD
  ✓ ADD1
  ✓ ADD_ID
  ✓ ARANGE
  ✓ ARGMAX
  ✓ ARGSORT
  ✓ CLAMP
  ✓ CONCAT
  ✓ CONV_2D
  ✓ CONV_2D_DW
  ✓ CONV_3D
  ✓ CONV_TRANSPOSE_1D
  ✓ CONV_TRANSPOSE_2D
  ✓ COS
  ✓ COUNT_EQUAL
  ✓ CPY
  ✓ CROSS_ENTROPY_LOSS
  ✓ CROSS_ENTROPY_LOSS_BACK
  ✓ DIAG_MASK_INF
  ✓ DIV
  ✓ DUP
  ✓ ELU
  ✓ EXP
  ✓ FLASH_ATTN_EXT
  ✓ GATED_LINEAR_ATTN
  ✓ GEGLU
  ✓ GEGLU_ERF
  ✓ GEGLU_QUICK
  ✓ GELU
  ✓ GELU_ERF
  ✓ GELU_QUICK
  ✓ GET_ROWS
  ✓ GET_ROWS_BACK
  ✓ GROUP_NORM
  ✓ HARDSIGMOID
  ✓ HARDSWISH
  ✓ IM2COL
  ✓ IM2COL_3D
  ✓ L2_NORM
  ✓ LEAKY_RELU
  ✓ LOG
  ✓ MEAN
  ✓ MUL
  ✓ MUL_MAT
  ✓ MUL_MAT_ID
  ✓ NEG
  ✓ NORM
  ✓ OPT_STEP_ADAMW
  ✓ OPT_STEP_SGD
  ✓ OUT_PROD
  ✓ PAD
  ✓ PAD_REFLECT_1D
  ✓ POOL_2D
  ✓ REGLU
  ✓ RELU
  ✓ REPEAT
  ✓ REPEAT_BACK
  ✓ RMS_NORM
  ✓ RMS_NORM_BACK
  ✓ ROLL
  ✓ ROPE
  ✓ ROPE_BACK
  ✓ RWKV_WKV6
  ✓ RWKV_WKV7
  ✓ SCALE
  ✓ SET
  ✓ SET_ROWS
  ✓ SGN
  ✓ SIGMOID
  ✓ SILU
  ✓ SILU_BACK
  ✓ SIN
  ✓ SOFT_MAX
  ✓ SOFT_MAX_BACK
  ✓ SQR
  ✓ SQRT
  ✓ SSM_CONV
  ✓ SSM_SCAN
  ✓ STEP
  ✓ SUB
  ✓ SUM
  ✓ SUM_ROWS
  ✓ SWIGLU
  ✓ SWIGLU_OAI
  ✓ TANH
  ✓ TIMESTEP_EMBEDDING
  ✓ UPSCALE

Operations without tests (14):
  ✗ ADD_REL_POS
  ✗ CUSTOM
  ✗ DIAG
  ✗ DIAG_MASK_ZERO
  ✗ FLASH_ATTN_BACK
  ✗ GET_REL_POS
  ✗ IM2COL_BACK
  ✗ MAP_CUSTOM1
  ✗ MAP_CUSTOM2
  ✗ MAP_CUSTOM3
  ✗ POOL_1D
  ✗ POOL_2D_BACK
  ✗ WIN_PART
  ✗ WIN_UNPART

Coverage Summary:
  Total operations: 103
  Tested operations: 89
  Untested operations: 14
  Coverage: 86.4%
```

Refs: ggml-org#15745

* use of ggml_op enum values instead of strcmp
…15924)

This commit applies the same caching to the release workflow which
currently exists for the main CI workflow that was introduced in Commit
ff02caf ("ci : cache ROCm installation
in windows-latest-cmake-hip (ggml-org#15887)").
* metal : make the backend async

ggml-ci

* cont : add comments, extend op offload, clean up

ggml-ci

* metal : fix batch size for MUL_MAT_ID

* metal : remove deprecated ggml_backend_metal_buffer_from_ptr

* metal : create only metal buffers, no wrapping of host memory

ggml-ci

* metal : restore .alloc_buffer for buffer_from_ptr_type

ggml-ci

* metal : remove broken implementation of GGML_OP_SET

ggml-ci

* metal : clean-up loose ends, ready for tests

ggml-ci

* metal : support both private and shared buffers

ggml-ci

* metal : enable private buffers + add global device queue

* metal : disable host buffer to prevent races

ggml-ci

* metal : avoid extra copy during set_tensor

ggml-ci

* metal : use separate buffer types for shread and private Metal buffers

ggml-ci

* metal : simplify synchronization logic

ggml-ci

* metal : fix build

ggml-ci

* metal : do not implement cpy_tensor

ggml-ci

* metal : separate implementations for shared and private buffers

ggml-ci
This commit fixes the zero padding for odd dimensions in
ggml_compute_forward_timestep_embedding_f32.
The motivation for this is that currently if an odd dimension is used,
the padding check incorrectly uses the dimension value for indexing.
For example, with dim=15:

Elements 0-6 are set to cosine values
Elements 7-13 are set to sine values
Element 14 is left uninitialized (contains garbage)
Element 15 is correctly set to zero

This fix changes embed_data[dim] to embed_data[2 * half] so that
element 14 (the first unused element) is properly set to zero as well
as the last element.

Resolves: ggml-org/ggml#1324
* support non-contiguous Q in build_attn_mha

* Update src/llama-graph.cpp

ggml-ci

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>
…rs (ggml-org#15909)

* Extend the support of T5 models with different encoder-decoder layers

Signed-off-by: Jie Fu <[email protected]>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update gguf-py/gguf/constants.py

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-arch.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-arch.h

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-hparams.h

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Rename n_dec_layer --> dec_n_layer

Signed-off-by: Jie Fu <[email protected]>

* Adapt to cases when dec_n_layer > n_layer

Signed-off-by: Jie Fu <[email protected]>

---------

Signed-off-by: Jie Fu <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
…gml-org#15872)

* Add fastdiv and fastmodulo to k_bin_bcast kernel

* Address review comments

* `prod_` instead of `prod` suffix

* Add test case for `k_bin_bcast_unravel` in CUDA backend
@jan-service-account jan-service-account merged commit aaae685 into dev Sep 11, 2025
3 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2025-09-11-00-33 branch September 11, 2025 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants