Skip to content

[pull] master from ggml-org:master#1156

Merged
pull[bot] merged 2 commits intosyther-labs:masterfrom
ggml-org:master
Feb 21, 2026
Merged

[pull] master from ggml-org:master#1156
pull[bot] merged 2 commits intosyther-labs:masterfrom
ggml-org:master

Conversation

@pull
Copy link

@pull pull bot commented Feb 21, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

crsawyer and others added 2 commits February 21, 2026 09:28
* Improve CUDA graph capture

Currently, CUDA graphs are eagerly enabled on the first call to ggml_backend_cuda_graph_compute. If the graph properties keep changing (4+ consecutive updates), the graph is permanently disabled. This is suboptimal because:

- The first call always incurs CUDA graph capture overhead even if the graph is unstable
- Once permanently disabled, CUDA graphs never re-enable even after the graph stabilizes (e.g., switching from prompt processing to decode)

The new approach delays CUDA graph activation until warmup completes: the same cgraph must be called at least twice with matching properties before CUDA graph capture begins. This avoids wasted capture overhead on volatile graphs and allows graphs to become eligible once they stabilize.
This also fixes issues such as #19708

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Remove EM dashes

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
@pull pull bot locked and limited conversation to collaborators Feb 21, 2026
@pull pull bot added the ⤵️ pull label Feb 21, 2026
@pull pull bot merged commit a0c91e8 into syther-labs:master Feb 21, 2026
54 of 76 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants