Sync master with upstream release b5558 #111

jan-service-account · 2025-06-01T00:11:02Z

Updates dev branch with latest release (b5558) from ggml-org/llama.cpp

…PU in cuda (ggml-org#13856) (ggml-org#13895) * 1. add "integrated" in ggml_cuda_device_info for distinguish whether it is Intergrate_gpu or discrete_gpu 2. Adjust the func:"ggml_backend_cuda_device_supports_buft" for this new feature * Update ggml/src/ggml-cuda/ggml-cuda.cu Adjusted code indentation Co-authored-by: Johannes Gäßler <[email protected]> * Update ggml/src/ggml-cuda/ggml-cuda.cu Fixed incorrect setting of variable types Co-authored-by: Johannes Gäßler <[email protected]> * Update ggml/src/ggml-cuda/ggml-cuda.cu Adjusted the judgment logic Co-authored-by: Johannes Gäßler <[email protected]> * add a host_buft assert in case of integrated_cuda_device with func:'evaluate_and_capture_cuda_graph()' * Update ggml/src/ggml-cuda/ggml-cuda.cu Add a defensive security assert Co-authored-by: Johannes Gäßler <[email protected]> * Update ggml/src/ggml-cuda/ggml-cuda.cu Adjusted the support judgment logic. Co-authored-by: Johannes Gäßler <[email protected]> * revoke the suggest commit changes due to it's not applicable in jetson_device * Update ggml/src/ggml-cuda/ggml-cuda.cu Add parentheses to enforce operator precedence Co-authored-by: Diego Devesa <[email protected]> * Update ggml/src/ggml-cuda/ggml-cuda.cu Fix ci bug: add a spaces Co-authored-by: Johannes Gäßler <[email protected]> --------- Co-authored-by: yangxiao <[email protected]> Co-authored-by: Johannes Gäßler <[email protected]> Co-authored-by: yangxiao <[email protected]> Co-authored-by: Diego Devesa <[email protected]>

* kv-cache : simplify the "struct llama_kv_cache" interface ggml-ci * kv-cache : revert the (n_swa + n_ubatch) change (for next PR) ggml-ci * kv-cache : some comments ggml-ci * context : fix graph reserve for multiple sequences ggml-ci * kv-cache : fix typo [no ci] * kv-cache : fix find_slot() logic for free slots ggml-ci * llama : add TODO for deprecating the defrag API in the future * kv-cache : improve find_slot() using min/max seq pos info ggml-ci * llama : handle aborts and compute errors ggml-ci * memory : extract state into llama_memory_state ggml-ci * kv-cache : add comments ggml-ci * server : update batching logic to reset n_batch on successful decode * server : upon full re-processing, remove the sequence from the cache * kv-cache : add TODO for doing split_equal when split_simple fails ggml-ci

…⚠️ breaking change) (ggml-org#13917) * mtmd : fix missing public header * no object * apply suggestion from Georgi * rm mtmd-helper, merge it to mtmd * missing vendor include dir

* llama : auto-batch ggml-ci * context : simplify if branching

* Replace alert and confirm with custom modals. This is needed as Webview in VS Code doesn't permit alert and confirm for security reasons. * use Modal Provider to simplify the use of confirm and alert modals. * Increase the z index of the modal dialogs. * Update index.html.gz * also add showPrompt * rebuild --------- Co-authored-by: igardev <[email protected]> Co-authored-by: Xuan Son Nguyen <[email protected]>

* llama : use n_swa + n_ubatch cells for SWA cache ggml-ci * llama : add warning about multi-sqeuence SWA contexts

ggml-ci

…build. (ggml-org#13945) Signed-off-by: Jiri Podivin <[email protected]>

…dows to avoid throttling (ggml-org#12995) * threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling We talked about adding LOW priority for GGML threads in the original threadpool PR. It might be useful for some cases to avoid contention. Latest Windows ARM64 releases started parking (offlining) the CPU cores more aggresively which results in suboptimal performance with n_threads > 4. To deal with that we now disable Power Throttling for our threads for the NORMAL and higher priorities. Co-authored-by: Diego Devesa <[email protected]> * threading: disable SetThreadInfo() calls for older Windows versions * Update tools/llama-bench/llama-bench.cpp Co-authored-by: Diego Devesa <[email protected]> --------- Co-authored-by: Diego Devesa <[email protected]>

Yangxiaoz and others added 10 commits May 31, 2025 08:48

mtmd : drop _shared from libmtmd name, merge helpers into libmtmd (…

51fa76f

…⚠️ breaking change) (ggml-org#13917) * mtmd : fix missing public header * no object * apply suggestion from Georgi * rm mtmd-helper, merge it to mtmd * missing vendor include dir

llama : auto-batch preparation (ggml-org#13845)

3f55f78

* llama : auto-batch ggml-ci * context : simplify if branching

llama : use n_swa + n_ubatch cells for SWA cache (ggml-org#13833)

3600cc2

* llama : use n_swa + n_ubatch cells for SWA cache ggml-ci * llama : add warning about multi-sqeuence SWA contexts

llama : deprecate explicit kv_self defrag/update calls (ggml-org#13921)

803f8ba

ggml-ci

server: allow unclosed thinking tags (ggml-org#13931)

e15898d

docs : Note about necessity of having libcurl installed for standard …

b3a89c3

…build. (ggml-org#13945) Signed-off-by: Jiri Podivin <[email protected]>

jan-service-account merged commit 034d0a8 into dev Jun 1, 2025
9 checks passed

jan-service-account deleted the update-dev-from-master-2025-06-01-00-11 branch June 1, 2025 00:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b5558 #111

Sync master with upstream release b5558 #111

Uh oh!

jan-service-account commented Jun 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Sync master with upstream release b5558 #111

Sync master with upstream release b5558 #111

Uh oh!

Conversation

jan-service-account commented Jun 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants