Sync master with upstream release b5149 #60

jan-service-account · 2025-04-18T00:08:29Z

Updates dev branch with latest release (b5149) from ggml-org/llama.cpp

The Granite's FIM tokens are very similar to Qwen's; it's just that they use underscore instead of a dash. So <fim_middle> for example instead of <fim-middle>. Opening up tokenizer_config.json in ibm-granite/granite-3.3-8b-base shows: ``` "<fim_prefix>", "<fim_middle>", "<fim_suffix>", "<fim_pad>", ... "<reponame>", ```

Submit operators using asynchronous threads to improve performance. Use the environment variable GGML_CANN_ASYNC_MODE to control whether asynchronous submission is enabled. It is disabled by default. Testing shows a 10%–20% performance improvement in scenarios with small parameter sizes, especially in quantized models.

…rg#12970)

…-org#12953) * graph : make mla compatible with FA * metal : add exp FA kernels for DeepSeek models ggml-ci * llama : minor naming updates ggml-ci * ggml : disable FA for DS head sizes * tests : add FA tests for MLA shapes ggml-ci

Noeda and others added 4 commits April 17, 2025 11:37

ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes (ggml-o…

207c22e

…rg#12970)

jan-service-account merged commit e199a87 into dev Apr 18, 2025
9 checks passed

jan-service-account deleted the update-dev-from-master-2025-04-18-00-08 branch April 18, 2025 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b5149 #60

Sync master with upstream release b5149 #60

Uh oh!

jan-service-account commented Apr 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Sync master with upstream release b5149 #60

Sync master with upstream release b5149 #60

Uh oh!

Conversation

jan-service-account commented Apr 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants