Bug: Qwen3.5-35B-A3B-UD-Q6_K_XL - Unexpected empty grammar stack

### What happened?

I keep getting the following crash with Qwen3.5-35B-A3B-UD-Q6_K_XL. This is the command i use: 

GGML_CUDA_GRAPH_OPT=1 USE_MLOCK=true /mnt2/srcds/ai/ik_llama.cpp/build/bin/llama-server --port 8009 -m /mnt2/srcds/ai/Qwen3.5-35B-A3B-UD-Q6_K_XL.gguf --ctx-size 262144 --threads-batch 11 --threads-draft 8 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.0 --presence_penalty 0.0 --repeat-penalty 1.0 --jinja --no-mmap -fa on -khad -rtr -gr -ger -ngl 333 -b 1024 -ub 1024 -ot .ffn_.*_exps.=CPU -amb 256 --ctx-checkpoints 500 -mqkv -cram -1 --cache-type-k q8_0


Disabling ctx checkpoints will fix the issue, but of course reprocessing the prompt every time makes it unusable.

Here is the log:

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3080 Laptop GPU, compute capability 8.6, VMM: yes, VRAM: 7840 MiB
INFO [                    main] build info | tid="140609950765056" timestamp=1773409944 build=4283 commit="714329f4"
INFO [                    main] system info | tid="140609950765056" timestamp=1773409944 n_threads=8 n_threads_batch=11 total_threads=16 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | "
CUDA0: using device CUDA0 - 7610 MiB free
llama_model_loader: loaded meta data with 52 key-value pairs and 733 tensors from /mnt2/srcds/ai/Qwen3.5-35B-A3B-UD-Q6_K_XL.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen35moe
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                     general.sampling.top_k i32              = 20
llama_model_loader: - kv   3:                     general.sampling.top_p f32              = 0.950000
llama_model_loader: - kv   4:                      general.sampling.temp f32              = 1.000000
llama_model_loader: - kv   5:                               general.name str              = Qwen3.5-35B-A3B
llama_model_loader: - kv   6:                           general.basename str              = Qwen3.5-35B-A3B
llama_model_loader: - kv   7:                       general.quantized_by str              = Unsloth
llama_model_loader: - kv   8:                         general.size_label str              = 35B-A3B
llama_model_loader: - kv   9:                            general.license str              = apache-2.0
llama_model_loader: - kv  10:                       general.license.link str              = https://huggingface.co/Qwen/Qwen3.5-3...
llama_model_loader: - kv  11:                           general.repo_url str              = https://huggingface.co/unsloth
llama_model_loader: - kv  12:                   general.base_model.count u32              = 1
llama_model_loader: - kv  13:                  general.base_model.0.name str              = Qwen3.5 35B A3B
llama_model_loader: - kv  14:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  15:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen3.5-3...
llama_model_loader: - kv  16:                               general.tags arr[str,2]       = ["unsloth", "image-text-to-text"]
llama_model_loader: - kv  17:                      qwen35moe.block_count u32              = 40
llama_model_loader: - kv  18:                   qwen35moe.context_length u32              = 262144
llama_model_loader: - kv  19:                 qwen35moe.embedding_length u32              = 2048
llama_model_loader: - kv  20:             qwen35moe.attention.head_count u32              = 16
llama_model_loader: - kv  21:          qwen35moe.attention.head_count_kv u32              = 2
llama_model_loader: - kv  22:          qwen35moe.rope.dimension_sections arr[i32,4]       = [11, 11, 10, 0]
llama_model_loader: - kv  23:                   qwen35moe.rope.freq_base f32              = 10000000.000000
llama_model_loader: - kv  24: qwen35moe.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  25:                     qwen35moe.expert_count u32              = 256
llama_model_loader: - kv  26:                qwen35moe.expert_used_count u32              = 8
llama_model_loader: - kv  27:             qwen35moe.attention.key_length u32              = 256
llama_model_loader: - kv  28:           qwen35moe.attention.value_length u32              = 256
llama_model_loader: - kv  29:       qwen35moe.expert_feed_forward_length u32              = 512
llama_model_loader: - kv  30: qwen35moe.expert_shared_feed_forward_length u32              = 512
llama_model_loader: - kv  31:                  qwen35moe.ssm.conv_kernel u32              = 4
llama_model_loader: - kv  32:                   qwen35moe.ssm.state_size u32              = 128
llama_model_loader: - kv  33:                  qwen35moe.ssm.group_count u32              = 16
llama_model_loader: - kv  34:               qwen35moe.ssm.time_step_rank u32              = 32
llama_model_loader: - kv  35:                   qwen35moe.ssm.inner_size u32              = 4096
llama_model_loader: - kv  36:          qwen35moe.full_attention_interval u32              = 4
llama_model_loader: - kv  37:             qwen35moe.rope.dimension_count u32              = 64
llama_model_loader: - kv  38:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  39:                         tokenizer.ggml.pre str              = qwen35
llama_model_loader: - kv  40:                      tokenizer.ggml.tokens arr[str,248320]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  41:                  tokenizer.ggml.token_type arr[i32,248320]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  42:                      tokenizer.ggml.merges arr[str,247587]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  43:                tokenizer.ggml.eos_token_id u32              = 248046
llama_model_loader: - kv  44:            tokenizer.ggml.padding_token_id u32              = 248055
llama_model_loader: - kv  45:                    tokenizer.chat_template str              = {%- set image_count = namespace(value...
llama_model_loader: - kv  46:               general.quantization_version u32              = 2
llama_model_loader: - kv  47:                          general.file_type u32              = 18
llama_model_loader: - kv  48:                      quantize.imatrix.file str              = Qwen3.5-35B-A3B-GGUF/imatrix_unsloth....
llama_model_loader: - kv  49:                   quantize.imatrix.dataset str              = unsloth_calibration_Qwen3.5-35B-A3B.txt
llama_model_loader: - kv  50:             quantize.imatrix.entries_count u32              = 510
llama_model_loader: - kv  51:              quantize.imatrix.chunks_count u32              = 76
llama_model_loader: - type  f32:  301 tensors
llama_model_loader: - type  f16:   90 tensors
llama_model_loader: - type q8_0:  264 tensors
llama_model_loader: - type q6_K:   78 tensors
load: printing all EOG tokens:
load:   - 248044 ('<|endoftext|>')
load:   - 248046 ('<|im_end|>')
load:   - 248063 ('<|fim_pad|>')
load:   - 248064 ('<|repo_name|>')
load:   - 248065 ('<|file_sep|>')
load: special tokens cache size = 33
load: token to piece cache size = 1.7581 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = qwen35moe
llm_load_print_meta: n_ctx_train      = 262144
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_layer          = 40
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 2
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_swa_pattern    = 1
llm_load_print_meta: n_embd_head_k    = 256
llm_load_print_meta: n_embd_head_v    = 256
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 512
llm_load_print_meta: n_embd_v_gqa     = 512
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 0
llm_load_print_meta: n_expert         = 256
llm_load_print_meta: n_expert_used    = 8
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 40
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 262144
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: mrope sections   = [11, 11, 10, 0]
llm_load_print_meta: ssm_d_conv       = 4
llm_load_print_meta: ssm_d_inner      = 4096
llm_load_print_meta: ssm_d_state      = 128
llm_load_print_meta: ssm_dt_rank      = 32
llm_load_print_meta: ssm_n_group      = 16
llm_load_print_meta: model type       = 35B.A3B
llm_load_print_meta: model ftype      = Q6_K
llm_load_print_meta: model params     = 34.661 B
llm_load_print_meta: model size       = 29.859 GiB (7.400 BPW) 
llm_load_print_meta: repeating layers = 28.853 GiB (7.367 BPW, 33.643 B parameters)
llm_load_print_meta: general.name     = Qwen3.5-35B-A3B
print_info: vocab type       = BPE
print_info: n_vocab          = 248320
print_info: n_merges         = 247587
print_info: BOS token        = 11 ','
print_info: EOS token        = 248046 '<|im_end|>'
print_info: EOT token        = 248046 '<|im_end|>'
print_info: PAD token        = 248055 '<|vision_pad|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 248060 '<|fim_prefix|>'
print_info: FIM SUF token    = 248062 '<|fim_suffix|>'
print_info: FIM MID token    = 248061 '<|fim_middle|>'
print_info: FIM PAD token    = 248063 '<|fim_pad|>'
print_info: FIM REP token    = 248064 '<|repo_name|>'
print_info: FIM SEP token    = 248065 '<|file_sep|>'
print_info: EOG token        = 248044 '<|endoftext|>'
print_info: EOG token        = 248046 '<|im_end|>'
print_info: EOG token        = 248063 '<|fim_pad|>'
print_info: EOG token        = 248064 '<|repo_name|>'
print_info: EOG token        = 248065 '<|file_sep|>'
print_info: max token length = 256
llm_load_tensors: ggml ctx size =    0.63 MiB
Tensor blk.0.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.0.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.0.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.1.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.1.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.1.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.2.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.2.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.2.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.3.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.3.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.3.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.4.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.4.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.4.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.5.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.5.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.5.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.6.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.6.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.6.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.7.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.7.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.7.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.8.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.8.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.8.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.9.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.9.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.9.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.10.ffn_up_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.10.ffn_gate_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.10.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.11.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.11.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.11.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.12.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.12.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.12.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.13.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.13.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.13.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.14.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.14.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.14.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.15.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.15.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.15.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.16.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.16.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.16.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.17.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.17.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.17.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.18.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.18.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.18.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.19.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.19.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.19.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.20.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.20.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.20.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.21.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.21.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.21.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.22.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.22.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.22.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.23.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.23.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.23.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.24.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.24.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.24.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.25.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.25.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.25.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.26.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.26.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.26.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.27.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.27.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.27.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.28.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.28.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.28.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.29.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.29.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.29.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.30.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.30.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.30.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.31.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.31.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.31.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.32.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.32.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.32.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.33.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.33.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.33.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.34.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.34.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.34.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.35.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.35.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.35.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.36.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.36.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.36.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.37.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.37.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.37.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.38.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.38.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.38.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
Tensor blk.39.ffn_up_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.39.ffn_gate_exps.weight (size = 210.00 MiB) buffer type overriden to CPU
Tensor blk.39.ffn_down_exps.weight (size = 272.00 MiB) buffer type overriden to CPU
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
llm_load_tensors:        CPU buffer size = 27804.00 MiB
llm_load_tensors:  CUDA_Host buffer size =   515.31 MiB
llm_load_tensors:      CUDA0 buffer size =  2256.30 MiB
.................................................................................................~ggml_backend_cuda_context: have 0 graphs
.
============ Repacked 120 tensors
llama_init_from_model: n_ctx         = 262144
llama_init_from_model: n_batch       = 1024
llama_init_from_model: n_ubatch      = 1024
llama_init_from_model: flash_attn    = 1
llama_init_from_model: attn_max_b    = 256
llama_init_from_model: fused_moe     = 1
llama_init_from_model: grouped er    = 1
llama_init_from_model: fused_up_gate = 1
llama_init_from_model: fused_mmad    = 1
llama_init_from_model: rope_cache    = 0
llama_init_from_model: graph_reuse   = 1
llama_init_from_model: k_cache_hadam = 1
llama_init_from_model: split_mode_graph_scheduling = 0
llama_init_from_model: reduce_type   = f16
llama_init_from_model: sched_async   = 0
llama_init_from_model: ser           = -1, 0
llama_init_from_model: freq_base     = 10000000.0
llama_init_from_model: freq_scale    = 1
llama_kv_cache_init:      CUDA0 KV buffer size =  3982.82 MiB
llama_init_from_model: KV self size  = 3920.00 MiB, K (q8_0): 1360.00 MiB, V (f16): 2560.00 MiB
llama_init_from_model:  CUDA_Host  output buffer size =     0.95 MiB
llama_init_from_model:      CUDA0 compute buffer size =   978.00 MiB
llama_init_from_model:  CUDA_Host compute buffer size =   601.03 MiB
llama_init_from_model: graph nodes  = 2785
llama_init_from_model: graph splits = 82
llama_init_from_model: enabling only_active_experts scheduling
INFO [                    init] initializing slots | tid="140609950765056" timestamp=1773409953 n_slots=1
srv          init: Exclude reasoning tokens when selecting slot based on similarity: start: <think>, end: </think>
use `--reasoning-tokens none` to disable.
INFO [                    init] new slot | tid="140609950765056" timestamp=1773409953 id_slot=0 n_ctx_slot=262144
no implementations specified for speculative decoding
slot         init: id  0 | task -1 | speculative decoding context not initialized
prompt cache is enabled, size limit: no limit
use `--cache-ram 0` to disable the prompt cache
init: chat template, example_format: '<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
<think>
'
srv          init: init: chat template, thinking = 1
INFO [                    main] model loaded | tid="140609950765056" timestamp=1773409953
INFO [                    main] HTTP server listening | tid="140609950765056" timestamp=1773409953 n_threads_http="15" port="8009" hostname="127.0.0.1"
INFO [              slots_idle] all slots are idle | tid="140609950765056" timestamp=1773409953
======== Prompt cache: cache size: 0, n_keep: 0, n_discarded_prompt: 0, cache_ram_n_min: 0, f_keep: 0.00, cache_ram_similarity: 0.50
Recurrent model does not support banned strings.
INFO [   launch_slot_with_task] slot is processing task | tid="140609950765056" timestamp=1773409953 id_slot=0 id_task=0
======== Cache: cache_size = 0, n_past0 =  0, n_past1 =  0, n_past_prompt1 = 0,  n_past2 =  0, n_past_prompt2 =  0
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773409953 id_slot=0 id_task=0 p0=0
slot create_check: id  0 | task 0 | created context checkpoint 1 of 500 (pos_min = 1023, pos_max = 1023, size = 62.822 MiB, took 20.97 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773409956 id_slot=0 id_task=0 p0=1024
slot create_check: id  0 | task 0 | created context checkpoint 2 of 500 (pos_min = 2047, pos_max = 2047, size = 62.830 MiB, took 20.71 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773409960 id_slot=0 id_task=0 p0=2048
slot create_check: id  0 | task 0 | created context checkpoint 3 of 500 (pos_min = 3071, pos_max = 3071, size = 62.837 MiB, took 20.54 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773409963 id_slot=0 id_task=0 p0=3072
slot create_check: id  0 | task 0 | created context checkpoint 4 of 500 (pos_min = 4095, pos_max = 4095, size = 62.845 MiB, took 18.85 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773409967 id_slot=0 id_task=0 p0=4096
slot create_check: id  0 | task 0 | created context checkpoint 5 of 500 (pos_min = 5119, pos_max = 5119, size = 62.853 MiB, took 18.84 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773409970 id_slot=0 id_task=0 p0=5120
slot create_check: id  0 | task 0 | created context checkpoint 6 of 500 (pos_min = 6143, pos_max = 6143, size = 62.861 MiB, took 18.99 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773409973 id_slot=0 id_task=0 p0=6144
slot create_check: id  0 | task 0 | created context checkpoint 7 of 500 (pos_min = 7167, pos_max = 7167, size = 62.869 MiB, took 23.34 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773409977 id_slot=0 id_task=0 p0=7168
slot create_check: id  0 | task 0 | created context checkpoint 8 of 500 (pos_min = 8191, pos_max = 8191, size = 62.877 MiB, took 20.82 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773409980 id_slot=0 id_task=0 p0=8192
slot create_check: id  0 | task 0 | created context checkpoint 9 of 500 (pos_min = 9215, pos_max = 9215, size = 62.884 MiB, took 19.23 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773409984 id_slot=0 id_task=0 p0=9216
slot create_check: id  0 | task 0 | created context checkpoint 10 of 500 (pos_min = 10239, pos_max = 10239, size = 62.892 MiB, took 21.16 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773409987 id_slot=0 id_task=0 p0=10240
slot create_check: id  0 | task 0 | created context checkpoint 11 of 500 (pos_min = 11263, pos_max = 11263, size = 62.900 MiB, took 20.07 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773409991 id_slot=0 id_task=0 p0=11264
slot create_check: id  0 | task 0 | created context checkpoint 12 of 500 (pos_min = 12287, pos_max = 12287, size = 62.908 MiB, took 18.45 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773409994 id_slot=0 id_task=0 p0=12288
slot create_check: id  0 | task 0 | created context checkpoint 13 of 500 (pos_min = 13311, pos_max = 13311, size = 62.916 MiB, took 18.93 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773409998 id_slot=0 id_task=0 p0=13312
slot create_check: id  0 | task 0 | created context checkpoint 14 of 500 (pos_min = 14335, pos_max = 14335, size = 62.923 MiB, took 19.62 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773410001 id_slot=0 id_task=0 p0=14336
slot create_check: id  0 | task 0 | created context checkpoint 15 of 500 (pos_min = 15359, pos_max = 15359, size = 62.931 MiB, took 19.80 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773410004 id_slot=0 id_task=0 p0=15360
slot create_check: id  0 | task 0 | created context checkpoint 16 of 500 (pos_min = 16383, pos_max = 16383, size = 62.939 MiB, took 19.59 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773410008 id_slot=0 id_task=0 p0=16384
slot create_check: id  0 | task 0 | created context checkpoint 17 of 500 (pos_min = 17407, pos_max = 17407, size = 62.947 MiB, took 19.07 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773410011 id_slot=0 id_task=0 p0=17408
slot create_check: id  0 | task 0 | created context checkpoint 18 of 500 (pos_min = 18431, pos_max = 18431, size = 62.955 MiB, took 19.20 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773410015 id_slot=0 id_task=0 p0=18432
slot create_check: id  0 | task 0 | created context checkpoint 19 of 500 (pos_min = 19455, pos_max = 19455, size = 62.962 MiB, took 19.80 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773410018 id_slot=0 id_task=0 p0=19456
slot create_check: id  0 | task 0 | created context checkpoint 20 of 500 (pos_min = 20417, pos_max = 20417, size = 62.970 MiB, took 22.72 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773410022 id_slot=0 id_task=0 p0=20418
slot create_check: id  0 | task 0 | created context checkpoint 21 of 500 (pos_min = 20423, pos_max = 20423, size = 62.970 MiB, took 19.37 ms)
slot print_timing: id  0 | task 0 | 
prompt eval time =   69234.02 ms / 20423 tokens (    3.39 ms per token,   294.99 tokens per second)
       eval time =    6899.52 ms /   149 tokens (   46.31 ms per token,    21.60 tokens per second)
      total time =   76133.54 ms / 20572 tokens
INFO [      log_server_request] request | tid="140608396042240" timestamp=1773410029 remote_addr="127.0.0.1" remote_port=54182 status=200 method="POST" path="/v1/messages" params={"beta":"true"}
slot create_check: id  0 | task 0 | created context checkpoint 22 of 500 (pos_min = 20570, pos_max = 20570, size = 62.971 MiB, took 27.48 ms)
INFO [           release_slots] slot released | tid="140609950765056" timestamp=1773410029 id_slot=0 id_task=0 n_ctx=262144 n_past=20571 n_system_tokens=0 n_cache_tokens=20571 truncated=false
INFO [              slots_idle] all slots are idle | tid="140609950765056" timestamp=1773410029
======== Prompt cache: cache size: 20571, n_keep: 0, n_discarded_prompt: 0, cache_ram_n_min: 0, f_keep: 1.00, cache_ram_similarity: 0.50
Recurrent model does not support banned strings.
INFO [   launch_slot_with_task] slot is processing task | tid="140609950765056" timestamp=1773410029 id_slot=0 id_task=170
======== Cache: cache_size = 20571, n_past0 =  20422, n_past1 =  20422, n_past_prompt1 = 20422,  n_past2 =  20423, n_past_prompt2 =  20423
Common part does not match fully
cache : <|im_start|>assistant
<think>
I need to explore the codebase structure first to understand how ports are currently configured and used, so I'll start by examining
prompt: <|im_start|>assistant
<think>

</think>

I'll explore the codebase to understand how RTSP and ONVIF ports are currently handled, then plan how
slot apply_checkp: id  0 | task 170 | n_past = 20422, slot.prompt.tokens.size() = 20571, seq_id = 0, pos_min = 20570
slot apply_checkp: id  0 | task 170 | restored context checkpoint took  15.80 ms (pos_min = 20417, pos_max = 20417, size = 62.970 MiB)
slot apply_checkp: id  0 | task 170 | erased invalidated context checkpoint (pos_min = 20423, pos_max = 20423, size = 62.970 MiB)
slot apply_checkp: id  0 | task 170 | erased invalidated context checkpoint (pos_min = 20570, pos_max = 20570, size = 62.971 MiB)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773410029 id_slot=0 id_task=170 p0=20418
slot create_check: id  0 | task 170 | created context checkpoint 21 of 500 (pos_min = 20560, pos_max = 20560, size = 62.971 MiB, took 20.65 ms)
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="140609950765056" timestamp=1773410030 id_slot=0 id_task=170 p0=20561
slot create_check: id  0 | task 170 | created context checkpoint 22 of 500 (pos_min = 20566, pos_max = 20566, size = 62.971 MiB, took 19.90 ms)
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unexpected empty grammar stack after accepting piece: =G (88838)
Aborted (core dumped)


### Name and Version

version: 4283 (714329f4)
built with cc (Ubuntu 14.2.0-19ubuntu2) 14.2.0 for x86_64-linux-gnu


### What operating system are you seeing the problem on?

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Qwen3.5-35B-A3B-UD-Q6_K_XL - Unexpected empty grammar stack #1420

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: Qwen3.5-35B-A3B-UD-Q6_K_XL - Unexpected empty grammar stack #1420

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions