Skip to content

Eval bug: SIGSEGV GPT-OSS:20B #15283

@mhitza

Description

@mhitza

Name and Version

$ ./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA RTX 4000 SFF Ada Generation, compute capability 8.9, VMM: yes
version: 6145 (04e1626)
built with gcc-13 (SUSE Linux) 13.3.1 20250313 [revision 4ef1d8c84faeebffeb0cc01ee22e891b41e5c4e0] for x86_64-suse-linux

Operating systems

Linux

GGML backends

CUDA

Hardware

13th Gen Intel(R) Core(TM) i5-13500 & NVIDIA RTX 4000 SFF Ada Generation

Models

gpt-20B.mxfp4 ( https://huggingface.co/ggml-org/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b-mxfp4.gguf )

Problem description & steps to reproduce

Server is started with the following arguments

$ llama-server -m gpt-20B.mxfp4.gguf \
    --no-webui --numa numactl --threads 32 --ctx-size 131072 \
    --n-gpu-layers 37 --no-op-offload \
    --temp 0 \
    --flash-attn \
    --jinja --chat-template-kwargs '{"reasoning_effort": "high"}' --reasoning-format none  \
    --repeat-penalty 1.2

Pass in the following prompt

curl -s -N -X POST "http://localhost:8080/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "{ \"messages\": [{\"role\": \"user\", \"content\": \"# apache forward a request to two backend in order an cgi program then a php fcgi call\"}], \"stream\":true }"

Then after about ~2 minutes on this system (at around 70 tokens/second) it crashes. Memory stays steady during that time (watching over in nvidia-smi) at around 14GB (of 20GB available).

First Bad Commit

8b2aac0 is the first bad commit

(previous commit sha reported is what version I was running when starting to draft the report)

Relevant log output

systemd-coredump[22807]: [🡕] Process 22624 (llama-server) of user 1000 dumped core.
                                                                  
Stack trace of thread 22624:
#0  0x00000000005bd7ec n/a (/home/developer/llama.cpp/source/build/bin/llama-server + 0x1bd7ec)
ELF object binary architecture: AMD x86-64


### in gdb

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000005bd7ec in std::__detail::_Executor<std::reverse_iterator<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, st
--Type <RET> for more, q to quit, c to continue without paging--c
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.38-150600.14.32.1.x86_64 krb5-debuginfo-1.20.1-150600.11.11.2.x86_64 libcom_err2-debuginfo-1.47.0-150600.4.6.2.x86_64 libcurl4-debuginfo-8.6.0-150600.4.21.1.x86_64 libgcc_s1-debuginfo-14.2.0+git10526-150000.1.6.1.x86_64 libgomp1-debuginfo-14.2.0+git10526-150000.1.6.1.x86_64 libopenssl3-debuginfo-3.1.4-150600.5.33.1.x86_64 libsasl2-3-debuginfo-2.1.28-150600.7.6.2.x86_64 libssh4-debuginfo-0.9.8-150600.11.3.1.x86_64 libstdc++6-debuginfo-14.2.0+git10526-150000.1.6.1.x86_64 nvidia-compute-G06-debuginfo-580.65.06-1.x86_64d::allocator<std::__cxx11::sub_match<std::reverse_iterator<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >, std::__cxx11::regex_traits<char>, true>::_M_dfs(std::__detail::_Executor<std::reverse_iterator<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::__cxx11::sub_match<std::reverse_iterator<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >, std::__cxx11::regex_traits<char>, true>::_Match_mode, long) ()
[Current thread is 1 (Thread 0x7f395d63d000 (LWP 22624))]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions