Skip to content

Qwen3-Next-Thinking can accidentally burn out GPU`s! #7340

@averageaidude

Description

@averageaidude

Describe the bug

Critical Error: Qwen3-Next-Thinking does not stop processing after Thinking and answer finished.

  • GPU`s will run infinitely!
  • Token / GPU usage will never stop

Solution:

1.) If a Qwen3-Next model is loaded the parameter swa-full must be added by default in extra-flags in any case!

2.) All Qwen3-Next Templates which comes from most Quantizers must overwritten meanwhile model loading immediately

from :

{%- for message in messages %}
    {%- if message['role'] == 'system' -%}
        {%- if message['content'] -%}
            {{- message['content'] + '\n\n' -}}
        {%- endif -%}
        {%- if user_bio -%}
            {{- user_bio + '\n\n' -}}
        {%- endif -%}
    {%- else -%}
        {%- if message['role'] == 'user' -%}
            {{- name1 + ': ' + message['content'] + '\n'-}}
        {%- else -%}
            {{- name2 + ': ' + message['content'] + '\n' -}}
        {%- endif -%}
    {%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt %}
    {{- name2 + ':' -}}
{%- endif %}

to

{%- for message in messages %}
    {%- if message['role'] == 'system' -%}
        <|im_start|>system
        {%- if message['content'] -%}
            {{- message['content'] }}
        {%- endif -%}
        {%- if user_bio -%}
            {%- if message['content'] %}{{ '\n' }}{%- endif -%}
            {{- user_bio }}
        {%- endif -%}
        <|im_end|>
        {{- '\n' }}
    {%- else -%}
        {%- if message['role'] == 'user' -%}
            <|im_start|>user
            {{- message['content'] }}<|im_end|>
            {{- '\n' }}
        {%- else -%}
            <|im_start|>assistant
            {{- message['content'] }}<|im_end|>
            {{- '\n' }}
        {%- endif -%}
    {%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt %}
    <|im_start|>assistant
{%- endif %}

The first template just creates nonsense cause of the missing <|im_end|> and never ending sentences with emojis with llama.ccp. I confirmed that with Alibaba. The template was just for cpu usage with low ram. It was never meant for GPU usage with kv-cache.

The solution solves the issue complete.

What i do not understand why even serious quantizers ship this template. I stepped over this first time loading the model. And i am sure that is not only related to Oobabooga. But cause Qwen3-Next especially targets low end hardware aka Gaming PC with bad air cooling and underpowered PSUs, we should take care of inexperienced users and protect them. This can really end in burned out Nvidia cards or died PSUs.

Thanks a lot for reading

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

Just load Qwen_Qwen3-Next-80B-A3B-Thinking-IQ4_XS.gguf with default setting

Screenshot

Image

Logs

srv   prompt_save:  - saving prompt with length 8191, total state size = 129.463 MiB
srv          load:  - looking for better prompt, base f_keep = 0.000, sim = 0.158
srv        update:  - cache state: 1 prompts, 129.463 MiB (limits: 8192.000 MiB, 131072 tokens, 518298 est)
srv        update:    - prompt 0x5850e43322b0:    8191 tokens, checkpoints:  0,   129.463 MiB
srv  get_availabl: prompt cache update took 142.49 ms
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> temp-ext -> top-k -> top-p -> typical -> min-p -> xtc -> dist 
slot update_slots: id  3 | task 8177 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  3 | task 8177 | n_tokens = 0, memory_seq_rm [0, end)
prompt processing progress, n_tokens = 19, batch.n_tokens = 19, progress = 1.000000

System Info

Oobalatest manual instal lates, Nvidia latest, python 3.11

Taichi X399 
2 x RTX 3090
1 x RTX 3060 (12GB)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions