Fix Qwen3 content extraction breaking code formatting #1

iSevenDays · 2025-07-29T07:29:04Z

Summary

Fixes a bug in Qwen3 content extraction where aggressive whitespace cleanup was breaking proper code formatting in non-tool-call streaming output.

Problem

The qwen3::extract_content_during_parsing() function was using an aggressive regex pattern that collapsed multiple newlines:

content = std::regex_replace(content, std::regex(R"(\n\s*\n)"), "\n");

This broke:

PEP 8 compliance: 2 empty lines between Python functions were collapsed to 1
Code formatting: Any intentional whitespace structure was lost
Streaming output: Users requesting properly formatted code received malformed results

Root Cause

The regex R"(\n\s*\n)" replaces any sequence of newline + whitespace + newline with a single newline, which is overly aggressive for content that should preserve formatting.

Solution

✅ Replace aggressive cleanup with gentle trimming

Remove std::regex_replace(content, std::regex(R"(\n\s*\n)"), "\n")
Replace with string_strip(content) following original llama.cpp patterns
Only trim leading/trailing whitespace, preserve internal formatting

✅ Add proper include

Add #include "../../common/common.h" for string_strip access

✅ Add regression test

New test_qwen3_whitespace_preservation() validates PEP 8 compliance
Ensures 2 empty lines between functions are preserved

Testing

✅ PEP 8 compliance: 2 empty lines between functions preserved
✅ Tool call parsing: All Qwen3 tests continue to pass
✅ No regressions: Existing functionality maintained
✅ Follows patterns: Matches original llama.cpp whitespace handling

Example

Before (broken):

def celsius_to_fahrenheit(celsius):
    return celsius * 9/5 + 32
def fahrenheit_to_celsius(fahrenheit):  # Missing empty lines\!
    return (fahrenheit - 32) * 5/9

After (fixed):

def celsius_to_fahrenheit(celsius):
    return celsius * 9/5 + 32


def fahrenheit_to_celsius(fahrenheit):  # Proper PEP 8 spacing preserved\!
    return (fahrenheit - 32) * 5/9

Files Changed

examples/server/parsers/qwen3_parser.hpp - Fix whitespace handling
tests/test-function-calls.cpp - Add regression test

Test Results

🧹 Testing Qwen3 Whitespace Preservation Fix:
✅ PASS: Qwen3: PEP 8 double empty lines preserved
✅ PASS: Qwen3: Content not empty after processing
✅ PASS: Qwen3: Function content preserved
✅ PASS: Qwen3: Second function preserved

This reverts commit b48d71f.

Co-authored-by: Iwan Kawrakow <[email protected]>

* Implement function calling / tools for ik_llama.cpp for Kimi K2 * Implement basic tool choice * Backport llama.cpp tool calls support * Enhance function calls with improved chat parser and string utilities - Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling - Improve function calls parsing with fallback to llama.cpp builder pattern - Add string utility functions (starts_with, ends_with, find_partial_stop) - Update README with function calls testing instructions - Enhance Kimi K2 parser and function calls documentation - Add comprehensive test suite for function calls - Update CMakeLists.txt and Makefile for new components * Enhance function calling with unified streaming and parser improvements - Fix streaming content cleanup to prevent function syntax in output - Unify content extraction patterns with llama.cpp approach - Improve Kimi K2 parser robustness and partial content handling - Add comprehensive test coverage for function call scenarios - Optimize chat message parsing and diff computation * Replace hardcoded values in kimi_k2_parser.hpp with named constants - Add compile-time constants for all token format markers - Add compile-time constants for XML format markers - Add compile-time constants for simple format patterns - Replace all hardcoded string literals with named constants - Use compile-time length calculation to avoid manual counting - Improve maintainability and reduce magic numbers throughout parser * Fix duplicate common_chat_parse definition - Remove duplicate implementation from chat-parser.cpp - Keep single implementation in chat.cpp following llama.cpp patterns - Resolves linker error: multiple definition of common_chat_parse * Fix JSON assertion failure in function call parsing - Add proper validation that 'function' field is an object before accessing nested keys - Handle missing 'arguments' field gracefully with default "{}" - Prevents crash when parsing malformed tool call JSON structures * Add comprehensive Qwen3 XML tool calling support with unit tests - Implement Qwen3 XML parser with <tool_call>{"name": "func", "arguments": {...}}</tool_call> format - Add model detection and routing for Qwen3 vs Kimi-K2 formats - Create 8 comprehensive unit tests covering parsing, streaming, error handling - Fix token format cleaning bug in kimi_k2_parser.hpp processing order - Remove progressive parsing code and related utilities - Add tool injection support for Qwen3 format in server utils * Add DeepSeek R1 function calling support with comprehensive unit tests - Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp - Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp - Update function_calls.hpp with DeepSeek R1 integration and content extraction - Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models - Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration - Port exact implementation patterns from original llama.cpp for compatibility Key features: - Native DeepSeek R1 format: <｜tool▁calls▁begin｜>function<｜tool▁sep｜>name```json{}```<｜tool▁call▁end｜><｜tool▁calls▁end｜> - Reasoning content extraction from <think>...</think> tags - Multiple tool calls support with separate call blocks - Model detection for deepseek-r1, deepseek_r1 naming patterns - Integration with incremental parsing and streaming support * Add partial parsing support for JSON and regex - json-partial.h/cpp: JSON partial parsing functionality - regex-partial.h/cpp: Regex partial parsing functionality * Add format_chat integration tests for Qwen3 tool injection - Add test_qwen3_format_chat_integration() to validate tool injection pipeline - Test tool injection conditions and system message enhancement - Verify JSON formatting and anti-preamble instructions - Add comprehensive test documentation Tests confirm tool injection works correctly - conversational preamble issue is not in ik_llama.cpp but likely in UI configuration. * Fix Qwen3 tool call parsing - pass model name to parser Server was not passing model name to parse_chat_message_incremental(), causing Qwen3 to fall back to Kimi-K2 parser and return tool calls as content instead of proper tool_calls array. * Fix non-streaming path to use model-specific parsing Non-streaming responses were hardcoded to use Kimi-K2 format, causing Qwen3 XML tool calls to be returned as content instead of proper tool_calls array. Now uses same model detection as streaming path for consistency.

* iq4_kss: slightly better quantization * iq4_kss: CUDA MMQ * iq4_kss: repack/convert to q8_k_r8 (AVX2) * iq4_kss: repack/convert to q8_k_r8 (NEON) --------- Co-authored-by: Iwan Kawrakow <[email protected]>

webui:bug fix

Fix missing token per second for webui after function call update

Problem: - qwen3::extract_content_during_parsing() used aggressive regex to collapse multiple newlines - This broke proper code formatting (e.g., PEP 8's 2 empty lines between functions) - Affected non-tool-call streaming output where formatting is critical Solution: - Replace aggressive std::regex_replace(R"(\n\s*\n)", "\n") with gentle string_strip() - Follow original llama.cpp patterns: only trim leading/trailing whitespace - Preserve internal formatting including multiple newlines - Add proper include for common.h to access string_strip function Changes: - examples/server/parsers/qwen3_parser.hpp: Replace whitespace cleanup with string_strip() - tests/test-function-calls.cpp: Add test_qwen3_whitespace_preservation() to prevent regression Testing: - ✅ PEP 8 compliance: 2 empty lines between functions preserved - ✅ Tool call parsing: All Qwen3 tests continue to pass - ✅ No regressions: Existing functionality maintained - ✅ Follows original llama.cpp whitespace handling patterns

ikawrakow and others added 16 commits July 22, 2025 09:01

Update README.md

b48d71f

Add .mailmap

fa94418

Revert "Update README.md"

866c70e

This reverts commit b48d71f.

Add GitHub data (ikawrakow#637)

ab7d193

Fix pauses after a comma (ikawrakow#639)

7093a35

Co-authored-by: Iwan Kawrakow <[email protected]>

Add GitHub data: filename sanitization (ikawrakow#640)

0451f10

Update AUTHORS

10929e6

Update README.md

9defceb

IQ4_KSS improvements (ikawrakow#642)

1b05210

* iq4_kss: slightly better quantization * iq4_kss: CUDA MMQ * iq4_kss: repack/convert to q8_k_r8 (AVX2) * iq4_kss: repack/convert to q8_k_r8 (NEON) --------- Co-authored-by: Iwan Kawrakow <[email protected]>

Enable LLM function calls (ikawrakow#643)

4e9c78c

bug fix no timings after tool update

82d9bc0

webui: move preset settings to top

608ff76

webui:bug fix

Fix text generation endpoint (ikawrakow#654)

d65c8ce

Merge pull request ikawrakow#648 from ikawrakow/fcp/missing_token_ps

ae0ba31

Fix missing token per second for webui after function call update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Qwen3 content extraction breaking code formatting #1

Fix Qwen3 content extraction breaking code formatting #1

Uh oh!

iSevenDays commented Jul 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix Qwen3 content extraction breaking code formatting #1

Are you sure you want to change the base?

Fix Qwen3 content extraction breaking code formatting #1

Uh oh!

Conversation

iSevenDays commented Jul 29, 2025

Summary

Problem

Root Cause

Solution

Testing

Example

Files Changed

Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants