Eval bug: Unexpected empty grammar stack after accepting piece: <unused32>

**Note: This issue was copied from [https://github.com/ggml-org/llama.cpp/issues/14413](https://github.com/ggml-org/llama.cpp/issues/14413)**

**Original Author:** @DanielMazurkiewicz
**Original Issue Number:** #14413
**Created:** 2025-06-27T07:35:28Z

---

### Name and Version

$ ~/soft/llama.cpp/build_hip_gfx1100/bin/llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
  Device 1: AMD Radeon Graphics, gfx1103 (0x1103), VMM: no, Wave Size: 32
version: 1 (8846aac)
built with cc (GCC) 15.1.1 20250425 for x86_64-pc-linux-gnu


### Operating systems

Linux

### GGML backends

HIP

### Hardware

Radeon 7900XTX

### Models

gemma-3n-E4B-it-Q8_0.gguf

### Problem description & steps to reproduce

This is how I use model
```
llama-server --model gemma-3n-E4B-it-Q8_0.gguf --n-gpu-layers 99 --main-gpu 0 --split-mode none
```


### First Bad Commit

_No response_

### Relevant log output

```shell
slot launch_slot_: id  0 | task 1574 | processing task
slot update_slots: id  0 | task 1574 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 929
slot update_slots: id  0 | task 1574 | n_past = 335, cache_tokens.size() = 1074, seq_id = 0, pos_min = 50, n_swa = 512
slot update_slots: id  0 | task 1574 | forcing full prompt re-processing due to lack of cache data (likely due to SWA, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 1574 | kv cache rm [0, end)
slot update_slots: id  0 | task 1574 | prompt processing progress, n_past = 929, n_tokens = 929, progress = 1.000000
slot update_slots: id  0 | task 1574 | prompt done, n_past = 929, n_tokens = 929
[New LWP 24394]
[New LWP 24391]
[New LWP 24390]
[New LWP 24389]
[New LWP 24388]
[New LWP 24387]
[New LWP 24386]
[New LWP 24385]
[New LWP 24384]
[New LWP 24383]
[New LWP 24381]
[New LWP 24380]
[New LWP 24379]
[New LWP 24378]
[New LWP 24377]
[New LWP 24376]
[New LWP 24375]
[New LWP 24374]
[New LWP 24365]



This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.archlinux.org>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
Function(s) ^std::(move|forward|as_const|(__)?addressof) will be skipped when stepping.
Function(s) ^std::(shared|unique)_ptr<.*>::(get|operator) will be skipped when stepping.
Function(s) ^std::(basic_string|vector|array|deque|(forward_)?list|(unordered_|flat_)?(multi)?(map|set)|span)<.*>::(c?r?(begin|end)|front|back|data|size|empty) will be skipped when stepping.
Function(s) ^std::(basic_string|vector|array|deque|span)<.*>::operator.] will be skipped when stepping.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0x00007fa2be9b5e22 in ?? () from /usr/lib/libc.so.6
#0  0x00007fa2be9b5e22 in ?? () from /usr/lib/libc.so.6
#1  0x00007fa2be9a9fda in ?? () from /usr/lib/libc.so.6
#2  0x00007fa2be9aa024 in ?? () from /usr/lib/libc.so.6
#3  0x00007fa2bea1a92f in wait4 () from /usr/lib/libc.so.6
#4  0x00007fa2bef7b42b in ggml_print_backtrace () from /home/daniel/soft/llama.cpp/build_hip_gfx1100/bin/libggml-base.so
#5  0x00007fa2bef8a3b9 in ggml_uncaught_exception() () from /home/daniel/soft/llama.cpp/build_hip_gfx1100/bin/libggml-base.so
#6  0x00007fa2becb1c1a in __cxxabiv1::__terminate (handler=<optimized out>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48
warning: 48	/usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_terminate.cc: No such file or directory
#7  0x00007fa2bec975db in std::terminate () at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58
58	in /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_terminate.cc
#8  0x00007fa2becb1ed6 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x5654ed0792d0 <typeinfo for std::runtime_error@GLIBCXX_3.4>, dest=0x7fa2becc99b0 <std::runtime_error::~runtime_error()>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc:98
warning: 98	/usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc: No such file or directory
#9  0x00007fa2c3af1e0f in llama_grammar_accept_str(llama_grammar&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) [clone .cold] () from /home/daniel/soft/llama.cpp/build_hip_gfx1100/bin/libllama.so
#10 0x00005654ed0022f4 in common_sampler_accept(common_sampler*, int, bool) ()
#11 0x00005654eced0108 in server_context::update_slots() ()
#12 0x00005654ece9c24e in server_queue::start_loop() ()
#13 0x00005654ece5ceb3 in main ()
[Inferior 1 (process 24361) detached]
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unexpected empty grammar stack after accepting piece: <unused32>
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Unexpected empty grammar stack after accepting piece: <unused32> #317

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Unexpected empty grammar stack after accepting piece: <unused32> #317

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions