Skip to content

fix: handle thinking model responses in /api/chat endpoint#764

Merged
Lightheartdevs merged 1 commit intoLight-Heart-Labs:mainfrom
yasinBursali:fix/chat-thinking-model-max-tokens
Apr 3, 2026
Merged

fix: handle thinking model responses in /api/chat endpoint#764
Lightheartdevs merged 1 commit intoLight-Heart-Labs:mainfrom
yasinBursali:fix/chat-thinking-model-max-tokens

Conversation

@yasinBursali
Copy link
Copy Markdown
Contributor

What

Fix /api/chat returning empty responses when a thinking model (e.g., Qwen3.5) is loaded.

Why

Thinking models generate <think>...</think> blocks before producing actual content. With max_tokens: 256, the model exhausts its token budget on the reasoning phase and never produces visible output — content is always empty string.

How

  • Increase max_tokens from 256 to 2048 (bounded by existing 30s HTTP timeout)
  • Strip <think>...</think> tags from response text via regex (safe, no ReDoS risk — fixed delimiters with lazy quantifier)

Testing

  • python -m py_compile passes
  • Manual: send a message via setup wizard chat, verify thinking tags are stripped and response is non-empty

Platform Impact

  • macOS / Linux / Windows: No impact — pure Python string processing on API response

🤖 Generated with Claude Code

Increase max_tokens from 256 to 2048 so thinking models (Qwen3.5) have
enough tokens to complete their reasoning phase and produce actual
content. Strip <think>...</think> tags from response text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Lightheartdevs Lightheartdevs merged commit 15ed4a3 into Light-Heart-Labs:main Apr 3, 2026
21 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants