[REQUIRES UPDATE] Agent responses using Ollama (qwen3-30B-A3B) are entirely rendered in "thoughts" block #7201

milesian01 · 2025-05-03T07:17:11Z

milesian01
May 3, 2025

What happened?

When using the Ollama provider with the qwen3_30ba3b_40k:latest model in an agent (think mode enabled), the entire model response is incorrectly rendered inside the "thoughts" block instead of only internal reasoning.

This doesn't happen when using the same model outside of agent mode in a regular chat.

Version Information

LibreChat v0.7.7
Model: qwen3_30ba3b
Provider: Ollama
Running as: Agent with think mode enabled

Steps to Reproduce

Create an agent using Ollama provider with qwen3_30ba3b (also tested with qwq, same issue)
Interact with the agent (e.g., send a test message like "test")
Observe the full response appearing within the thoughts block

What browsers are you seeing the problem on?

Chrome

Relevant log output

No backend error logs observed; issue appears to be related to how frontend parses agent + think responses.

Screenshots

Code of Conduct

I agree to follow this project's Code of Conduct

danny-avila · 2025-05-03T11:45:44Z

danny-avila
May 3, 2025
Maintainer

9 replies

milesian01 May 5, 2025
Author

Just retested with the latest Ollama release (v0.6.8) — issue still persists.

One detail that might help narrow down the cause: in non-agent chats, the output is correctly split between thoughts and response, but the chat title includes the raw think tokens (screenshot attached). In contrast, agent mode renders the entire response inside the "thoughts" block (as I originally reported), but the title looks fine.

What's our next step?

danny-avila May 5, 2025
Maintainer

The distinction doesn't make a difference for me, as it works on my end with Agents:

I've even pulled the exact model you are using.

Is the output of the LLM being streamed? Or is it outputting all at once?

You can also try clearing your cache (here's how on chrome):

Then refresh

milesian01 May 5, 2025
Author

Cleared cache — no change.

In agent mode, output is not streamed and shows up all at once inside the "thoughts" block.

In regular chat, output is streamed and renders correctly, though the title still includes think tokens.

Seems like a streaming/rendering difference between modes. Let me know if logs or anything else would help.

danny-avila May 5, 2025
Maintainer

I suspected you might have an issue with streaming, although I'm not sure what's causing the completion not to stream with Agents for you, since Ollama streams for me with Agents.

Can you share your custom config (librechat.yaml), at least the relevant parts (ollama custom endpoint definition)?

milesian01 May 5, 2025
Author

Sure — here's the relevant part of my librechat.yaml:

- name: "Ollama"
  apiKey: "ollama"
  baseURL: "http://192.168.50.250:11434/v1/chat/completions"
  models:
    default: [
      "deepseek-r1:32b-qwen-distill-q8_0",
      "deepseek-r1:32b"
    ]
    fetch: true
  titleConvo: true
  titleModel: "current_model"
  summarize: false
  summaryModel: "current_model"
  forcePrompt: false
  modelDisplayLabel: "Ollama"

Not seeing anything off here. Streaming works fine outside agent mode, so the connection to Ollama seems solid. Let me know if there's anything else you'd like me to try.

doprado · 2025-05-05T22:57:19Z

doprado
May 5, 2025

Hello, here for me in reasoning models it is being shown this way.

0 replies

parthpat12 · 2025-05-07T10:34:42Z

parthpat12
May 7, 2025

Had similar issue using qwen3 with LM Studio and was able to fix it by turning OFF “Reasoning Section Parsing” on LM Studio side and let LC decide when to apply those tags.

0 replies

danny-avila · 2025-05-07T21:10:04Z

danny-avila
May 7, 2025
Maintainer

So the root of this issue is that Ollama (and maybe other local LLM options) is streaming the entire response in a single chunk for the stream /chat/completions generation. The stream handling is not designed to parse <think> tags when both the opening and closing tags are included in a single chunk.

I'm testing a fix and will merge today.

0 replies

danny-avila · 2025-05-07T21:41:58Z

danny-avila
May 7, 2025
Maintainer

Closed by #7275

6 replies

danny-avila May 9, 2025
Maintainer

This is an LLM issue, as <response> is not something we are parsing and is not standard for reasoning. The LLM has enclosed its actual reasoning with the <think> tags, now properly parsed. It's an LLM performance issue at this point (Deepseek doesn't have this issue when using Ollama).

milesian01 May 9, 2025
Author

Confirmed — the response tag was hallucinated by the model. Surprising behavior for a 30B, honestly feels more like a 3B in this case, which it kind of is!

Still seeing two things:

Agent responses still don’t stream — everything shows up at once. Is that expected?
In non-agent chats, think tags still appear in the chat title (screenshot attached). Shouldn’t those be stripped?

danny-avila May 9, 2025
Maintainer

Agent responses still don’t stream — everything shows up at once. Is that expected?

Ollama does not stream completions when there are tool calls. There’s an open issue about it on their GitHub.

In non-agent chats, think tags still appear in the chat title (screenshot attached). Shouldn’t those be stripped?

if think tags are stripped, then the title will be empty because responses are limited to 15 tokens. It would be better to delegate titles to a smaller model, or disable reasoning, which I’m not sure we can do.

milesian01 May 9, 2025
Author

Understood on the streaming, thanks.

Re the title issue: what's confusing is that in agent chats, titles are generated correctly without the think tags, using the same reasoning model (see screenshot). So it seems parsing is handled better in that context.

Would it make sense to apply the same logic from agent mode to regular chats with reasoning models?

milesian01 May 29, 2025
Author

Hey @danny-avila, just a heads up that Ollama has added support for streaming with tool calls. Might be worth revisiting agent streaming now that it's available.

https://ollama.com/blog/streaming-tool
ollama/ollama#10415

Also, circling back on the title issue I mentioned in my last message: in agent mode, think tags aren't included in titles, but they still appear in non-agent chat titles. Just wondering if that same title logic could be reused outside agents too?

Uh oh!

[REQUIRES UPDATE] Agent responses using Ollama (qwen3-30B-A3B) are entirely rendered in "thoughts" block #7201

Uh oh!

milesian01 May 3, 2025

What happened?

Version Information

Steps to Reproduce

What browsers are you seeing the problem on?

Relevant log output

Screenshots

Code of Conduct

Replies: 5 comments · 15 replies

Uh oh!

danny-avila May 3, 2025 Maintainer

Uh oh!

Uh oh!

milesian01 May 5, 2025 Author

Uh oh!

Uh oh!

danny-avila May 5, 2025 Maintainer

You can also try clearing your cache (here's how on chrome):

Then refresh

Uh oh!

milesian01 May 5, 2025 Author

Uh oh!

danny-avila May 5, 2025 Maintainer

Uh oh!

milesian01 May 5, 2025 Author

Uh oh!

doprado May 5, 2025

Uh oh!

Uh oh!

parthpat12 May 7, 2025

Uh oh!

danny-avila May 7, 2025 Maintainer

Uh oh!

danny-avila May 7, 2025 Maintainer

Uh oh!

Uh oh!

danny-avila May 9, 2025 Maintainer

Uh oh!

Uh oh!

milesian01 May 9, 2025 Author

Uh oh!

danny-avila May 9, 2025 Maintainer

Uh oh!

Uh oh!

milesian01 May 9, 2025 Author

Uh oh!

milesian01 May 29, 2025 Author

milesian01
May 3, 2025

Replies: 5 comments 15 replies

danny-avila
May 3, 2025
Maintainer

milesian01 May 5, 2025
Author

danny-avila May 5, 2025
Maintainer

milesian01 May 5, 2025
Author

danny-avila May 5, 2025
Maintainer

milesian01 May 5, 2025
Author

doprado
May 5, 2025

parthpat12
May 7, 2025

danny-avila
May 7, 2025
Maintainer

danny-avila
May 7, 2025
Maintainer

danny-avila May 9, 2025
Maintainer

milesian01 May 9, 2025
Author

danny-avila May 9, 2025
Maintainer

milesian01 May 9, 2025
Author

milesian01 May 29, 2025
Author