docker: parameterize CUDA version in build matrix, remove cuda-new.Dockerfile#20920
docker: parameterize CUDA version in build matrix, remove cuda-new.Dockerfile#20920CISC merged 1 commit intoggml-org:masterfrom
Conversation
|
Hi @M1DNYT3, thanks for your contribution! Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:
Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below. |
|
Just use the |
CUDA 13 drops support for Pascal and Maxwell. While both are quite old, 12.9.1 is the last 12.x release, which enables Blackwell without dropping anything. No need to split the stack into two images either; mixed fleets with old and new hardware stay on a single image until fully transitioned to 13 and dropped 12.x. |
|
Noticed a merge conflict popped up, and I see 21207 was just merged, bumping the CUDA floor to 12.8 - seems like the concern about forward compatibility is acknowledged after all. Would appreciate a fresh look at my proposal given that context, especially since the review bar appears more flexible than I was led to believe. |
|
I am considering bumping both 12.x and 13.x to latest, and removing |
@CISC |
It did a whole lot more, most specifically updating the base image and tools so that arm64 builds could be done, the bump was a pre-requisite for that, not the other way around. |
Makes sense on the ARM64 context. Given that 12.8 landed as a prerequisite for that work, bumping to 12.9.1 seems like an even smaller ask now that the baseline has already moved - it's the latest stable 12.x, picking up all fixes since 12.8, while avoiding 13.x, which still has open issues like #21255. If 12.9.1 becomes a prerequisite for something else tomorrow, the PR is already there. I have no objection to #21207 or its purpose - my question is simply why #20920 shouldn't be merged on its own merits. |
@CISC Following up on the above - is my PR worth a second look on its own merits? |
With some modifications, sure. :) First of all, leave |
…ckerfile Co-authored-by: CISC <CISC@users.noreply.github.com>
@CISC Done + marked you as a Co-Author since you suggested the design. Could you please review the changes? cuda13 builds for both arches are pointing at 13.1.1, as it was in |
CISC
left a comment
There was a problem hiding this comment.
Perfect, thank you, can you run it on your fork to verify it builds (disable s390x to be able to complete)?
Runs for now, will notify additionally once completed. Made a separate branch specifically for build test with s390x line deleted from the yml, so it won't impact the PR. https://github.com/M1DNYT3/llama.cpp/actions/runs/23900519302 |
@CISC, Done the builds, all green, 12/12. You can check at the URL I provided above. |
Notable upstream changes: - ggml-org#21038: rotate activations for better KV cache quantization - ggml-org#21074: generic NVFP4 MMQ kernel - ggml-org#21271: fix FA kernel selection logic - ggml-org#21238: fix mmvq mmid kernel selection - ggml-org#20998: FA support for head dim 512 - ggml-org#20920: docker cuda12 bump to 12.9.1 Conflicts resolved: - fattn.cu: took upstream (adds head_dim 512 exclusion) - mmq.cu/mmq.cuh: kept both TQ3 types + upstream additions - llama-graph.cpp: kept both turbo WHT + upstream v_rot - docker.yml: took upstream - tests/CMakeLists.txt: took upstream
Overview
ggml/src/ggml-cuda/CMakeLists.txtgates Blackwell cubin compilation onCUDAToolkit_VERSION >= 12.8:The official image ships with CUDA_VERSION=12.4.0 - gate never triggers, RTX 50xx and RTX PRO 6000 fall back to PTX JIT.
Validated across 4 architectures (100-record batch, 19 parallel slots per card): https://github.com/M1DNYT3/self-hosted-llm-inference/blob/main/iterations/10-blackwell-root-cause/summary.md
12.9.1 is the latest 12.x. Pascal (sm_60/62) and Maxwell (sm_52) are not deprecated until 13.0 - no supported GPU is dropped.
Test image:
docker pull m1dnyt3/llama-cpp:server-cuda-12.9Full logs + dmon traces: https://github.com/M1DNYT3/self-hosted-llm-inference/tree/main/iterations/10-blackwell-root-cause/logs
Addresses #17822, #18865.
Additional information
If you need additional data, I can run benchmarks on demand for any specific GPU. The harness is built around Vast.ai, so any card available on the platform can be tested - inference cost per benchmark run is negligible.
Requirements