Skip to content

Commit fa7a0f3

Browse files
authored
Fix for Deepseek r1 parsing (#676)
* Implement function calling / tools for ik_llama.cpp for Kimi K2 * Implement basic tool choice * Backport llama.cpp tool calls support * Enhance function calls with improved chat parser and string utilities - Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling - Improve function calls parsing with fallback to llama.cpp builder pattern - Add string utility functions (starts_with, ends_with, find_partial_stop) - Update README with function calls testing instructions - Enhance Kimi K2 parser and function calls documentation - Add comprehensive test suite for function calls - Update CMakeLists.txt and Makefile for new components * Enhance function calling with unified streaming and parser improvements - Fix streaming content cleanup to prevent function syntax in output - Unify content extraction patterns with llama.cpp approach - Improve Kimi K2 parser robustness and partial content handling - Add comprehensive test coverage for function call scenarios - Optimize chat message parsing and diff computation * Replace hardcoded values in kimi_k2_parser.hpp with named constants - Add compile-time constants for all token format markers - Add compile-time constants for XML format markers - Add compile-time constants for simple format patterns - Replace all hardcoded string literals with named constants - Use compile-time length calculation to avoid manual counting - Improve maintainability and reduce magic numbers throughout parser * Fix duplicate common_chat_parse definition - Remove duplicate implementation from chat-parser.cpp - Keep single implementation in chat.cpp following llama.cpp patterns - Resolves linker error: multiple definition of common_chat_parse * Fix JSON assertion failure in function call parsing - Add proper validation that 'function' field is an object before accessing nested keys - Handle missing 'arguments' field gracefully with default "{}" - Prevents crash when parsing malformed tool call JSON structures * Add comprehensive Qwen3 XML tool calling support with unit tests - Implement Qwen3 XML parser with <tool_call>{"name": "func", "arguments": {...}}</tool_call> format - Add model detection and routing for Qwen3 vs Kimi-K2 formats - Create 8 comprehensive unit tests covering parsing, streaming, error handling - Fix token format cleaning bug in kimi_k2_parser.hpp processing order - Remove progressive parsing code and related utilities - Add tool injection support for Qwen3 format in server utils * Add DeepSeek R1 function calling support with comprehensive unit tests - Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp - Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp - Update function_calls.hpp with DeepSeek R1 integration and content extraction - Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models - Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration - Port exact implementation patterns from original llama.cpp for compatibility Key features: - Native DeepSeek R1 format: <|tool▁calls▁begin|>function<|tool▁sep|>name```json{}```<|tool▁call▁end|><|tool▁calls▁end|> - Reasoning content extraction from <think>...</think> tags - Multiple tool calls support with separate call blocks - Model detection for deepseek-r1, deepseek_r1 naming patterns - Integration with incremental parsing and streaming support * Add partial parsing support for JSON and regex - json-partial.h/cpp: JSON partial parsing functionality - regex-partial.h/cpp: Regex partial parsing functionality * Add format_chat integration tests for Qwen3 tool injection - Add test_qwen3_format_chat_integration() to validate tool injection pipeline - Test tool injection conditions and system message enhancement - Verify JSON formatting and anti-preamble instructions - Add comprehensive test documentation Tests confirm tool injection works correctly - conversational preamble issue is not in ik_llama.cpp but likely in UI configuration. * Fix Qwen3 tool call parsing - pass model name to parser Server was not passing model name to parse_chat_message_incremental(), causing Qwen3 to fall back to Kimi-K2 parser and return tool calls as content instead of proper tool_calls array. * Fix non-streaming path to use model-specific parsing Non-streaming responses were hardcoded to use Kimi-K2 format, causing Qwen3 XML tool calls to be returned as content instead of proper tool_calls array. Now uses same model detection as streaming path for consistency. * Update Qwen3 function call handling in server and tests - Enhanced server function call detection and response formatting - Improved test coverage for Qwen3 tool call scenarios - Refined XML parsing for better tool execution support * Add DeepSeek-R1 function call parsing support Implements comprehensive parsing for all 4 DeepSeek-R1 function call formats: - Format 1: Standard function call syntax (already supported) - Format 2: Alternative function call patterns (already supported) - Format 3: Tools array format - function\n```json\n{"tools": [...]} - Format 4: XML wrapped format - <tool_call>function</think>Name\n```json\n{...}```</tool_call> Key changes: - Added parse_deepseek_r1_tools_array() following original parse_prefixed_json_tool_call_array pattern - Added parse_deepseek_r1_xml_wrapped() following Hermes-2-Pro XML wrapper patterns - Integrated both parsers into exception handling chain for robust fallback - Added comprehensive TDD test coverage for all formats - Anonymized all confidential information while preserving functionality Resolves tool_calls_count=0 issue where DeepSeek-R1 models generated valid tool calls but server failed to parse them correctly. * Update function_calls.md documentation for DeepSeek-R1 Format 4 - Added Format 4 (XML wrapped) documentation with examples - Updated implementation notes with correct parser order (3→4→1→2) - Marked all DeepSeek-R1 formats as working (July 2025 update) - Updated test status for Format 3 and 4 as passing - Added parse_deepseek_r1_xml_wrapped() function reference - Corrected implementation file line numbers * Fix merge conflict in test-function-calls.cpp - Removed incomplete merge conflict marker from line 3027 - Ensured all tests compile and pass successfully - All DeepSeek-R1 formats (1-4) working correctly - All streaming and content cleaning tests passing * Fix DeepSeek R1 parsing issue with responses wrapped in think tags Restore missing consume_rest() call from working PR #648 implementation. When responses don't contain tool calls, remaining content after reasoning parsing must be preserved as displayable content. Fixes issue where entire responses wrapped in <think> tags resulted in empty content output. * Implement proper reasoning handling following original llama.cpp patterns - Add missing reasoning_format and reasoning_in_content fields to common_chat_syntax - Update try_parse_reasoning to match original llama.cpp logic exactly - Add TDD test case with reasoning_in_content=true for DeepSeek R1 - Following TDD: test should now pass with proper syntax configuration Based on original llama.cpp implementation patterns. * TDD SUCCESS: Fix DeepSeek R1 thinking tag termination issue ✅ Test passes with reasoning_in_content=true configuration - Content properly preserved: '<think>content</think>' displays fully - Reasoning field empty as expected - Following TDD: test-first approach validates the fix Next: Update server to automatically apply this configuration. * Complete server integration fix for DeepSeek R1 thinking tag termination - Server now automatically sets reasoning_in_content=true for DeepSeek R1 models - Fixes issue where responses wrapped in <think> tags appear empty to users * Add TDD test case for DeepSeek R1 thinking tag termination issue - Test reproduces the exact failure scenario reported by user - Validates that reasoning_in_content=true fixes the issue - Demonstrates empty content problem and working solution * Add remaining TDD test changes for DeepSeek R1 thinking tag fix * Add debug output after upstream merge * Remove temporary benchmark and debug files - Remove tests/benchmark-progressive-parsing.cpp (development tool, not part of core functionality) - Remove tests/reproduce_bug.sh (debugging script, not needed for PR)
1 parent 41f0d2e commit fa7a0f3

File tree

5 files changed

+101
-19
lines changed

5 files changed

+101
-19
lines changed

common/chat-parser.cpp

Lines changed: 28 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -82,28 +82,38 @@ bool common_chat_msg_parser::try_consume_literal(const std::string & literal) {
8282
}
8383

8484
bool common_chat_msg_parser::try_parse_reasoning(const std::string & start_think, const std::string & end_think) {
85-
auto start_pos = input_.find(start_think, pos_);
86-
if (start_pos == std::string::npos) {
87-
return false;
88-
}
85+
auto handle_reasoning = [&](const std::string & reasoning, bool closed) {
86+
auto stripped_reasoning = string_strip(reasoning);
87+
if (stripped_reasoning.empty()) {
88+
return;
89+
}
90+
if (syntax_.reasoning_in_content) {
91+
add_content(syntax_.reasoning_format == COMMON_REASONING_FORMAT_DEEPSEEK ? "<think>" : start_think);
92+
add_content(stripped_reasoning);
93+
if (closed) {
94+
add_content(syntax_.reasoning_format == COMMON_REASONING_FORMAT_DEEPSEEK ? "</think>" : end_think);
95+
}
96+
} else {
97+
add_reasoning_content(stripped_reasoning);
98+
}
99+
};
89100

90-
auto end_pos = input_.find(end_think, start_pos + start_think.size());
91-
if (end_pos == std::string::npos) {
92-
if (is_partial_) {
93-
// Partial reasoning content
94-
auto reasoning = input_.substr(start_pos + start_think.size());
95-
add_reasoning_content(string_strip(reasoning));
96-
pos_ = input_.size();
101+
if (syntax_.reasoning_format != COMMON_REASONING_FORMAT_NONE) {
102+
if (syntax_.thinking_forced_open || try_consume_literal(start_think)) {
103+
if (auto res = try_find_literal(end_think)) {
104+
handle_reasoning(res->prelude, /* closed */ true);
105+
consume_spaces();
106+
return true;
107+
}
108+
auto rest = consume_rest();
109+
if (!rest.empty()) {
110+
handle_reasoning(rest, /* closed */ !is_partial());
111+
}
112+
// Allow unclosed thinking tags for now (following original llama.cpp)
97113
return true;
98114
}
99-
return false;
100115
}
101-
102-
// Extract reasoning content
103-
auto reasoning = input_.substr(start_pos + start_think.size(), end_pos - start_pos - start_think.size());
104-
add_reasoning_content(string_strip(reasoning));
105-
pos_ = end_pos + end_think.size();
106-
return true;
116+
return false;
107117
}
108118

109119
std::optional<common_chat_msg_parser::find_regex_result> common_chat_msg_parser::try_find_literal_legacy(const std::string & literal) {

common/chat.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -278,6 +278,9 @@ void common_chat_parse_deepseek_r1(common_chat_msg_parser & builder) {
278278
throw; // Re-throw for partial mode
279279
}
280280
}
281+
282+
// Add any remaining content (critical for responses without tool calls)
283+
builder.add_content(builder.consume_rest());
281284
}
282285

283286
// Parse DeepSeek R1 tools array format following original llama.cpp parse_prefixed_json_tool_call_array pattern

common/chat.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,8 +135,18 @@ enum common_chat_format {
135135
COMMON_CHAT_FORMAT_KIMI_K2, // Our custom format (keep last for backward compatibility)
136136
};
137137

138+
enum common_reasoning_format {
139+
COMMON_REASONING_FORMAT_NONE,
140+
COMMON_REASONING_FORMAT_DEEPSEEK,
141+
COMMON_REASONING_FORMAT_DEEPSEEK_LEGACY,
142+
};
143+
138144
struct common_chat_syntax {
139145
common_chat_format format = COMMON_CHAT_FORMAT_KIMI_K2;
146+
common_reasoning_format reasoning_format = COMMON_REASONING_FORMAT_NONE;
147+
// Whether reasoning_content should be inlined in the content (e.g. for reasoning_format=deepseek in stream mode)
148+
bool reasoning_in_content = false;
149+
bool thinking_forced_open = false;
140150
bool enable_thinking = false;
141151
bool enable_tool_calls = true;
142152
};

examples/server/function_calls.hpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,8 @@ static ik_chat_msg parse_chat_message_incremental(const std::string& content, bo
8989
try {
9090
common_chat_syntax syntax;
9191
syntax.format = COMMON_CHAT_FORMAT_DEEPSEEK_R1;
92+
syntax.reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK;
93+
syntax.reasoning_in_content = true; // Fix for thinking tag termination issue
9294
syntax.enable_tool_calls = true;
9395

9496
common_chat_msg_parser parser(content, is_partial, syntax);

tests/test-function-calls.cpp

Lines changed: 58 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3298,11 +3298,68 @@ int main() {
32983298
std::cout << "✅ PASS: Qwen3 XML tool calls -> finish_reason='tool_calls'" << std::endl;
32993299

33003300
std::cout << "🎯 All streaming finish_reason tests passed!" << std::endl;
3301+
3302+
// TDD: Test for thinking tag termination issue - Reproduce user's exact complaint
3303+
std::cout << std::endl;
3304+
std::cout << "🧠 Testing DeepSeek R1 thinking tag termination issue..." << std::endl;
3305+
3306+
// Test case: Response wrapped entirely in think tags (reported issue)
3307+
std::string wrapped_response = "<think>This should be content but is wrapped in think tags</think>";
3308+
3309+
std::cout << "\n 1. REPRODUCING FAILURE - Without fix (reasoning_in_content=false):" << std::endl;
3310+
3311+
// First reproduce the failing behavior that user reported
3312+
common_chat_syntax broken_syntax;
3313+
broken_syntax.format = COMMON_CHAT_FORMAT_DEEPSEEK_R1;
3314+
broken_syntax.reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK;
3315+
broken_syntax.reasoning_in_content = false; // This causes the reported issue
3316+
broken_syntax.enable_tool_calls = false;
3317+
3318+
try {
3319+
auto broken_msg = common_chat_parse(wrapped_response, false, broken_syntax);
3320+
std::cout << " Content: '" << broken_msg.content << "'" << std::endl;
3321+
std::cout << " Reasoning: '" << broken_msg.reasoning_content << "'" << std::endl;
3322+
3323+
if (broken_msg.content.empty() && !broken_msg.reasoning_content.empty()) {
3324+
std::cout << " ❌ REPRODUCED USER BUG: Content disappears (thinking tags don't terminate properly)" << std::endl;
3325+
std::cout << " User sees: EMPTY CONTENT - this is exactly what was reported!" << std::endl;
3326+
}
3327+
} catch (const std::exception& e) {
3328+
std::cout << " ❌ Exception: " << e.what() << std::endl;
3329+
}
3330+
3331+
std::cout << "\n 2. DEMONSTRATING FIX - With fix (reasoning_in_content=true):" << std::endl;
3332+
3333+
// Now show the fix works
3334+
common_chat_syntax fixed_syntax;
3335+
fixed_syntax.format = COMMON_CHAT_FORMAT_DEEPSEEK_R1;
3336+
fixed_syntax.reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK;
3337+
fixed_syntax.reasoning_in_content = true; // Key fix: display thinking as content
3338+
fixed_syntax.enable_tool_calls = false;
3339+
3340+
try {
3341+
auto msg = common_chat_parse(wrapped_response, false, fixed_syntax);
3342+
std::cout << " Content: '" << msg.content << "'" << std::endl;
3343+
std::cout << " Reasoning: '" << msg.reasoning_content << "'" << std::endl;
3344+
3345+
if (msg.content.find("This should be content but is wrapped in think tags") != std::string::npos) {
3346+
std::cout << " ✅ PASS: Content properly preserved from think tags (with reasoning_in_content=true)" << std::endl;
3347+
std::cout << " User sees: Full content - this fixes the reported issue!" << std::endl;
3348+
} else if (msg.content.empty() && !msg.reasoning_content.empty()) {
3349+
std::cout << " ❌ FAILING TEST: Entire response treated as reasoning instead of content!" << std::endl;
3350+
std::cout << " Expected: Content should contain the text from within think tags" << std::endl;
3351+
} else {
3352+
std::cout << " ⚠️ PARTIAL: Some content found but may not contain expected text" << std::endl;
3353+
}
3354+
} catch (const std::exception& e) {
3355+
std::cout << " ❌ Exception in thinking tag test: " << e.what() << std::endl;
3356+
}
3357+
33013358
} catch (const std::exception& e) {
33023359
std::cout << std::endl;
33033360
std::cout << "❌ Test failed with exception: " << e.what() << std::endl;
33043361
return 1;
33053362
}
33063363

33073364
return 0;
3308-
}
3365+
}

0 commit comments

Comments
 (0)