Skip to content

Conversation

@rudiservo
Copy link
Contributor

Issue #12500, Cuda docker images are crashing with old GPU and maybe more recent ones because token_embd.weight is being processed by the CPU, since BMI2 was added this causes the program to crash due to compatibility issues.

It was recommended to add all CPU variants to all GPU images because it would benefit GPU images with CPU compatibility.

@rudiservo rudiservo requested a review from ngxson as a code owner April 4, 2025 13:34
@github-actions github-actions bot added the devops improvements to build systems and github actions label Apr 4, 2025
@ngxson ngxson requested a review from slaren April 4, 2025 13:53
Copy link
Member

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but if any of these images are intended to be used on Arm (maybe Vulkan?), it would need the same logic as the CPU image to disable GGML_BACKEND_DL on Arm builds.

@rudiservo
Copy link
Contributor Author

As far as I can tell only CPU images are built with ARM, all gpu images are built for AMD64.
If you want GPU builds for ARM, I can add them in a latter PR maybe? I don't have an ARM to test it with a GPU, so I will need someone to test it.

@slaren slaren merged commit b0091ec into ggml-org:master Apr 9, 2025
2 checks passed
colout pushed a commit to colout/llama.cpp that referenced this pull request Apr 21, 2025
timwu pushed a commit to timwu/llama.cpp that referenced this pull request May 5, 2025
@aubinkure
Copy link

I was running into the same segfault running llama-quantize in the Docker container as these issues:
#11683
#12564
#11196

I traced it back to this PR. Any thoughts on how to fix it?

Separately, it seems like there's no CI on the Docker images? A few short tests would be quite helpful, I think.

@rudiservo
Copy link
Contributor Author

@aubinkure how can you trace it back to this PR if this was more recent than the issues you refereed to.
you traced it forward m8, either this solves it or it doesn't, the only thing this does is to add CPU support to all GPU images.
If anything it's an issue with BMI2, so not this issue or PR.

@aubinkure
Copy link

You're right, it seems this PR isn't related to the issues I mentioned. That said, I double checked and this PR is definitely breaking quantization for me inside a Docker container. I'll open up a new issue with steps to reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants