Skip to content

Conversation

am17an
Copy link
Collaborator

@am17an am17an commented Oct 14, 2025

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Oct 14, 2025
@slaren slaren merged commit 120bf70 into ggml-org:master Oct 14, 2025
67 of 70 checks passed
@am17an am17an deleted the cuda_fix_rms_norm_fusion branch October 14, 2025 15:08
ddh0 added a commit to ddh0/llama.cpp that referenced this pull request Oct 14, 2025
* cuda : remove legacy copy-op pointer indirection code (ggml-org#16485)

* remove legacy copy-op pointer indirection code

* further removal of copy-op indirection code

* renamed check_node_graph_compatibility_and_refresh_copy_ops function

* CUDA: add fp kernel for larger batch size MoE (ggml-org#16512)

* CUDA: kernel for larger batch sizes for MoE

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* fixup

* tests

* Move mmq_ids_helper to mmid

* cleanup

* Remove redundant checks

* CUDA: use fastdiv + ggml_cuda_mad for mmvf (ggml-org#16557)

* CUDA: use fastdiv + ggml_cuda_mad for mmvf

* use bf16 directly + fix formatting

* Add exception for HIP code

* CUDA: enable FA for FP32 KV cache (ggml-org#16546)

* vulkan: Improve build time for MSVC (ggml-org#16545)

Enable CMP0147 so custom build steps (invoking vulkan-shader-gen) are run in parallel.

Enable /MP so source files are compiled in parallel.

* vulkan: Support FA with K/V in F32 (ggml-org#16543)

* CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion (ggml-org#16577)

* vulkan: Add ACC_TYPE_VEC2 implementation (ggml-org#16203)

Signed-off-by: Stefan Savic <[email protected]>
Co-authored-by: Stefan Savic <[email protected]>

* metal : avoid using Metal's gpuAddress property (ggml-org#16576)

* metal : avoid using Metal's gpuAddress property

* metal : fix rope kernels buffer check

---------

Signed-off-by: Stefan Savic <[email protected]>
Co-authored-by: Anav Prasad <[email protected]>
Co-authored-by: Aman Gupta <[email protected]>
Co-authored-by: Johannes Gäßler <[email protected]>
Co-authored-by: Jeff Bolz <[email protected]>
Co-authored-by: SavicStefan <[email protected]>
Co-authored-by: Stefan Savic <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs OpenCL Issues specific to the OpenCL backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

crash in ggml-cuda with rms_norm and mul fusion

2 participants