[newb] Llama.cpp crashes with gtp-oss 20B model on Tesla V100, but works fine with Llama 3.1 8B #15119

antonkratz · 2025-08-06T12:29:54Z

antonkratz
Aug 6, 2025

I am a total beginner with llama.cpp. I have an account on a HPC cluster running Ubuntu, and a node with a V100 (Tesla V100-SXM2-16GB, CUDA Version: 12.3). I do not have a root/superuser account on this cluster, just a normal user account.

I am struggling to install gpt-oss:20B with llama.cpp, utilizing my V100.

Using ChatGPT and Google I managed to compile llama.cpp from scratch for my environment. I did:

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
mkdir build
cd build

cmake \
  -DCMAKE_BUILD_TYPE=Release \
  -DGGML_CUDA=ON \
  -DCMAKE_CUDA_ARCHITECTURES="70" \
  ..

cmake --build . --config Release -- -j$(nproc)

This results in a binary. I got Llama for testing purposes (wget https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-IQ2_M.gguf) and then start it with:

~/work/llama.cpp/build/bin/llama-cli -m ~/work/llama.cpp/models/Meta-Llama-3.1-8B-Instruct-IQ2_M.gguf

This works perfectly well.

However, when I try to use it with gpt-oss:20B which I got from here (wget https://huggingface.co/ggml-org/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b-mxfp4.gguf), then weird things happen:

When I start the model (~/work/llama.cpp/build/bin/llama-cli -m ~/work/llama.cpp/models/gpt-oss-20b-mxfp4.gguf) and ask a question ("Which model are you?") it crashes. I attached the log.

Weirdly, when I start the model and simply press Enter, without any input at the prompt, it starts generating text about random topics! So I assume the code must at least do... something.

I spend a lot of time on this, but even with help from using ChatGPT, Google... I feel totally stuck. Can I even run gpt-oss:20B on a V100? I am not root so I cannot do upgrades to CUDA or anything that requires root privileges. I tried to provide as much meaningful info as I could. Could someone point me on the right path? Thanks.

log.txt

Answered by slaren

Aug 6, 2025

Looks like some issue allocating virtual memory. It may work if you build with -DGGML_CUDA_NO_VMM=ON.

View full answer

slaren · 2025-08-06T12:35:21Z

slaren
Aug 6, 2025
Maintainer

Looks like some issue allocating virtual memory. It may work if you build with -DGGML_CUDA_NO_VMM=ON.

1 reply

antonkratz Aug 6, 2025
Author

Thank you so much @slaren it worked!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[newb] Llama.cpp crashes with gtp-oss 20B model on Tesla V100, but works fine with Llama 3.1 8B #15119

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[newb] Llama.cpp crashes with gtp-oss 20B model on Tesla V100, but works fine with Llama 3.1 8B #15119

Uh oh!

Uh oh!

antonkratz Aug 6, 2025

Replies: 1 comment · 1 reply

Uh oh!

slaren Aug 6, 2025 Maintainer

Uh oh!

antonkratz Aug 6, 2025 Author

antonkratz
Aug 6, 2025

Replies: 1 comment 1 reply

slaren
Aug 6, 2025
Maintainer

antonkratz Aug 6, 2025
Author