Skip to content

Bug: Periodic crashes with message "Deepseek2 does not support K-shift" #686

@Lissanro

Description

@Lissanro

What happened?

Sometimes, especially when using Cline, the backend crashes with message "Deepseek2 does not support K-shift".

I understand that it does not support K-shift, but in such a case I think it could just reevaluate context normally, probably some part of it will be a cache hit but even if not, it still would be better without crashing, maybe display it as warning instead. This because these crashes break agentic workflow, not allowing to continue automatically.

Please let me know if I missed something and if it is misconfiguration on my side (like missing some command line option). This is how I run it:

numactl --cpunodebind=0 --interleave=all ~/pkgs/ik_llama.cpp/build/bin/llama-server \
--model /mnt/neuro/models/DeepSeek-R1-256x21B-0528-IQ4_K-163840seq/DeepSeek-R1-256x21B-0528-IQ4_XS-163840seq.gguf \
--ctx-size 102400 --n-gpu-layers 62 --tensor-split 15,25,30,30 -mla 3 -fa -ctk q8_0 -amb 1024 -fmoe -b 4096 -ub 4096 \
-ot "blk\.3\.ffn_up_exps=CUDA0, blk\.3\.ffn_gate_exps=CUDA0, blk\.3\.ffn_down_exps=CUDA0" \
-ot "blk\.4\.ffn_up_exps=CUDA1, blk\.4\.ffn_gate_exps=CUDA1, blk\.4\.ffn_down_exps=CUDA1" \
-ot "blk\.5\.ffn_up_exps=CUDA2, blk\.5\.ffn_gate_exps=CUDA2, blk\.5\.ffn_down_exps=CUDA2" \
-ot "blk\.6\.ffn_up_exps=CUDA3, blk\.6\.ffn_gate_exps=CUDA3, blk\.6\.ffn_down_exps=CUDA3" \
-ot "ffn_down_exps=CPU, ffn_up_exps=CPU, gate_exps=CPU" \
--threads 64 --host 0.0.0.0 --port 5000

Name and Version

Latest git

What operating system are you seeing the problem on?

Linux

Relevant log output

...
INFO [           print_timings]           total time =   70700.13 ms | tid="134518025084928" timestamp=1754958412 id_slot=0 id_task=8124 t_prompt_processing=48951.336 t_token_generation=21748.792 t_total=70700.128
INFO [            update_slots] slot released | tid="134518025084928" timestamp=1754958412 id_slot=0 id_task=8124 n_ctx=102400 n_past=99887 n_system_tokens=0 n_cache_tokens=99887 truncated=false
INFO [            update_slots] all slots are idle | tid="134518025084928" timestamp=1754958412
INFO [format_partial_response_oaicompat] DEBUG: Streaming finish_reason check | tid="134067588681728" timestamp=1754958412 generated_text="<think>\n<attempt_completion>\n<result>\nI've made the following improvements:\n\n1. Added Space Grotesk font to all text elements in header\n2. Ensured consistent use of Tailwind color classes\n3. Preserved all animations and hover effects\n4. Fixed font family issues\n\nThe design should now much more closely match the reference. You can run the development server to verify.\n</result>\n<command>yarn dev</command>\n</attempt_completion>" model_name="x" tool_calls_count=0
INFO [      log_server_request] request | tid="134067588681728" timestamp=1754958412 remote_addr="127.0.0.1" remote_port=49610 status=200 method="POST" path="/v1/chat/completions" params={}
INFO [            update_slots] all slots are idle | tid="134518025084928" timestamp=1754958412
INFO [   launch_slot_with_task] slot is processing task | tid="134518025084928" timestamp=1754959245 id_slot=0 id_task=8225
INFO [            update_slots] kv cache rm [p0, end) | tid="134518025084928" timestamp=1754959246 id_slot=0 id_task=8225 p0=99887
INFO [            update_slots] slot context shift | tid="134518025084928" timestamp=1754959375 id_slot=0 id_task=8225 n_keep=1 n_left=102398 n_discard=51199 n_ctx=102400 n_past=102399 n_system_tokens=0 n_cache_tokens=102399
/home/lissanro/pkgs/ik_llama.cpp/src/llama.cpp:18969: Deepseek2 does not support K-shift

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions