[newb] Llama.cpp crashes with gtp-oss 20B model on Tesla V100, but works fine with Llama 3.1 8B #15119
-
I am a total beginner with llama.cpp. I have an account on a HPC cluster running Ubuntu, and a node with a V100 (Tesla V100-SXM2-16GB, CUDA Version: 12.3). I do not have a root/superuser account on this cluster, just a normal user account. I am struggling to install gpt-oss:20B with llama.cpp, utilizing my V100. Using ChatGPT and Google I managed to compile llama.cpp from scratch for my environment. I did:
This results in a binary. I got Llama for testing purposes (
This works perfectly well. However, when I try to use it with gpt-oss:20B which I got from here ( When I start the model ( Weirdly, when I start the model and simply press Enter, without any input at the prompt, it starts generating text about random topics! So I assume the code must at least do... something. I spend a lot of time on this, but even with help from using ChatGPT, Google... I feel totally stuck. Can I even run gpt-oss:20B on a V100? I am not root so I cannot do upgrades to CUDA or anything that requires root privileges. I tried to provide as much meaningful info as I could. Could someone point me on the right path? Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Looks like some issue allocating virtual memory. It may work if you build with |
Beta Was this translation helpful? Give feedback.
Looks like some issue allocating virtual memory. It may work if you build with
-DGGML_CUDA_NO_VMM=ON
.