Skip to content

Model runner no longer outputs reasoning tokens? #121

@f2bo

Description

@f2bo

I notice that model runner no longer outputs the reasoning content (i.e. the <think>...</think> block) for models such as qwen3 or deepseek-r1-distill-llama. I started noticing this following one of the more recent updates, since previously I did see the models output this type of content.

For example, below is the output from Qwen3 and DeepSeek from Model Runner.

Qwen3

docker model run ai/qwen3
Interactive chat mode started. Type '/bye' to exit.
> How many Rs in strawberry?
The word **"strawberry"** contains **three** instances of the letter **R**.

Here's the breakdown:
**S-T-R-A-W-B-E-R-R-Y**
- **R** at position 3
- **R** at position 8
- **R** at position 9

**Answer:** 3 R's.

DeepSeek-R1-Distill-Llama

docker model run ai/deepseek-r1-distill-llama
Interactive chat mode started. Type '/bye' to exit.
> How many Rs in strawberry?
The word "strawberry" contains three Rs.

**Step-by-Step Explanation:**

1. **Write Out the Word:** S-T-R-A-W-B-E-R-R-Y.
2. **Identify Each Letter:** S, T, R, A, W, B, E, R, R, Y.
3. **Count the Rs:**
   - The third letter is R (1).
   - The eighth letter is R (2).
   - The ninth letter is R (3).
4. **Total Count:** 3 Rs.

**Answer:** There are three Rs in "strawberry."

For contrast, this is the output from Qwen3 running in Ollama.

Qwen3 (in Ollama)

ollama run qwen3
>>> How many Rs in strawberry?
Thinking...
Okay, the user is asking how many times the letter 'R' appears in the word "strawberry." Let me break this down
step by step.

First, I should write out the word clearly: S-T-R-A-W-B-E-R-R-Y. Wait, let me make sure I have the spelling right.
Strawberry is spelled S-T-R-A-W-B-E-R-R-Y. Let me count each letter one by one.

Starting with the first letter: S (1), T (2), R (3), A (4), W (5), B (6), E (7), R (8), R (9), Y (10). Wait,
that's 10 letters. Let me check again. S-T-R-A-W-B-E-R-R-Y. Yes, that's correct. So the letters are S, T, R, A, W,
B, E, R, R, Y.

Now, I need to count how many 'R's are there. Let's go through each letter:

1. S – not R
2. T – not R
3. R – that's one
4. A – not R
5. W – not R
6. B – not R
7. E – not R
8. R – that's two
9. R – that's three
10. Y – not R

So there are three 'R's in "strawberry." Wait, but let me double-check. Sometimes people might miscount. Let me
write them out again:

Letters in strawberry: S, T, R, A, W, B, E, R, R, Y. So positions 3, 8, and 9 are R. That's three Rs.

Hmm, but I've heard before that sometimes people might miss a letter. Let me confirm once more. The word is
spelled S-T-R-A-W-B-E-R-R-Y. Yes, the third letter is R, then after the E, there are two more Rs. So three Rs in
total.

I think that's correct. So the answer should be 3.
...done thinking.

The word **strawberry** contains **3** instances of the letter **R**.

**Breakdown:**
S-T-R-A-W-B-E-R-R-Y
Positions: 3rd, 8th, and 9th letters are **R**.

**Answer:** 3 Rs.

Also, inferencing seems to be significantly slower after the latest updates, but this could be subjective since I didn't pay attention to this aspect previously. Qwen3 is certainly a lot slower (by more than 100%) than this model running in Ollama but again, I'm not certain if both models are exactly the same.

https://hub.docker.com/r/ai/qwen3

docker model inspect ai/qwen3
{
    "id": "sha256:79fa56c07429f64f41950fe2f524937cf3ae9ea9bd3d7ada72170b036ea3cc85",
    "tags": [
        "ai/qwen3"
    ],
    "created": 1746022478,
    "config": {
        "format": "gguf",
        "quantization": "IQ2_XXS/Q4_K_M",
        "parameters": "8.19 B",
        "architecture": "qwen3",
        "size": "4.68 GiB"
    }
}

https://ollama.com/library/qwen3:8b

ollama show qwen3
  Model
    architecture        qwen3
    parameters          8.2B
    context length      40960
    embedding length    4096
    quantization        Q4_K_M

  Capabilities
    completion
    tools
    thinking

  Parameters
    top_k             20
    top_p             0.95
    repeat_penalty    1
    stop              "<|im_start|>"
    stop              "<|im_end|>"
    temperature       0.6

  License
    Apache License
    Version 2.0, January 2004

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions