Name and Version
$ ./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 6121 (e54d41b)
built with cc (GCC) 14.3.1 20250523 (Red Hat 14.3.1-1) for x86_64-redhat-linux
Sending the simple prompt "What is bugonia?" to the 20b model on gpt-oss.com gives a perfect response.
With llama-cli it tries to reason an answer but never comes close the the correct answer from gpt-oss.com.
Neither of these invocations give an acceptable anser:
$ ./llama.cpp/llama-cli -hf unsloth/gpt-oss-20b-GGUF:F16 --jinja -ngl 99 --threads -1 --ctx-size 16384 --temp 1.0 --top-p 1.0 --top-k 0
$ ./llama.cpp/llama-cli -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa --jinja --reasoning-format none
Operating systems
Linux
GGML backends
CUDA
Hardware
Intel(R) Core(TM) i7-5820K CPU
RTX 3090
Models
No response
Problem description & steps to reproduce
run command above with given prompt
First Bad Commit
New with this model
Relevant log output