[DRAFT] Function call updates #670

iSevenDays · 2025-08-02T08:15:29Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

- Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling - Improve function calls parsing with fallback to llama.cpp builder pattern - Add string utility functions (starts_with, ends_with, find_partial_stop) - Update README with function calls testing instructions - Enhance Kimi K2 parser and function calls documentation - Add comprehensive test suite for function calls - Update CMakeLists.txt and Makefile for new components

- Fix streaming content cleanup to prevent function syntax in output - Unify content extraction patterns with llama.cpp approach - Improve Kimi K2 parser robustness and partial content handling - Add comprehensive test coverage for function call scenarios - Optimize chat message parsing and diff computation

- Add compile-time constants for all token format markers - Add compile-time constants for XML format markers - Add compile-time constants for simple format patterns - Replace all hardcoded string literals with named constants - Use compile-time length calculation to avoid manual counting - Improve maintainability and reduce magic numbers throughout parser

- Remove duplicate implementation from chat-parser.cpp - Keep single implementation in chat.cpp following llama.cpp patterns - Resolves linker error: multiple definition of common_chat_parse

- Add proper validation that 'function' field is an object before accessing nested keys - Handle missing 'arguments' field gracefully with default "{}" - Prevents crash when parsing malformed tool call JSON structures

- Implement Qwen3 XML parser with <tool_call>{"name": "func", "arguments": {...}}</tool_call> format - Add model detection and routing for Qwen3 vs Kimi-K2 formats - Create 8 comprehensive unit tests covering parsing, streaming, error handling - Fix token format cleaning bug in kimi_k2_parser.hpp processing order - Remove progressive parsing code and related utilities - Add tool injection support for Qwen3 format in server utils

- Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp - Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp - Update function_calls.hpp with DeepSeek R1 integration and content extraction - Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models - Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration - Port exact implementation patterns from original llama.cpp for compatibility Key features: - Native DeepSeek R1 format: <｜tool▁calls▁begin｜>function<｜tool▁sep｜>name```json{}```<｜tool▁call▁end｜><｜tool▁calls▁end｜> - Reasoning content extraction from <think>...</think> tags - Multiple tool calls support with separate call blocks - Model detection for deepseek-r1, deepseek_r1 naming patterns - Integration with incremental parsing and streaming support

- json-partial.h/cpp: JSON partial parsing functionality - regex-partial.h/cpp: Regex partial parsing functionality

- Add test_qwen3_format_chat_integration() to validate tool injection pipeline - Test tool injection conditions and system message enhancement - Verify JSON formatting and anti-preamble instructions - Add comprehensive test documentation Tests confirm tool injection works correctly - conversational preamble issue is not in ik_llama.cpp but likely in UI configuration.

Server was not passing model name to parse_chat_message_incremental(), causing Qwen3 to fall back to Kimi-K2 parser and return tool calls as content instead of proper tool_calls array.

Non-streaming responses were hardcoded to use Kimi-K2 format, causing Qwen3 XML tool calls to be returned as content instead of proper tool_calls array. Now uses same model detection as streaming path for consistency.

- Enhanced server function call detection and response formatting - Improved test coverage for Qwen3 tool call scenarios - Refined XML parsing for better tool execution support

- Integrated latest upstream changes from ikawrakow/ik_llama.cpp - Resolved conflicts in test-function-calls.cpp - Maintained local enhancements for Qwen3 function calling support

Implements comprehensive parsing for all 4 DeepSeek-R1 function call formats: - Format 1: Standard function call syntax (already supported) - Format 2: Alternative function call patterns (already supported) - Format 3: Tools array format - function\n```json\n{"tools": [...]} - Format 4: XML wrapped format - <tool_call>function</think>Name\n```json\n{...}```</tool_call> Key changes: - Added parse_deepseek_r1_tools_array() following original parse_prefixed_json_tool_call_array pattern - Added parse_deepseek_r1_xml_wrapped() following Hermes-2-Pro XML wrapper patterns - Integrated both parsers into exception handling chain for robust fallback - Added comprehensive TDD test coverage for all formats - Anonymized all confidential information while preserving functionality Resolves tool_calls_count=0 issue where DeepSeek-R1 models generated valid tool calls but server failed to parse them correctly.

- Added Format 4 (XML wrapped) documentation with examples - Updated implementation notes with correct parser order (3→4→1→2) - Marked all DeepSeek-R1 formats as working (July 2025 update) - Updated test status for Format 3 and 4 as passing - Added parse_deepseek_r1_xml_wrapped() function reference - Corrected implementation file line numbers

Resolved merge conflict in tests/test-function-calls.cpp by combining: - DeepSeek-R1 Format 4 XML wrapper tests - Streaming finish_reason logic tests from origin/main Both test suites now coexist and provide comprehensive coverage.

- Removed incomplete merge conflict marker from line 3027 - Ensured all tests compile and pass successfully - All DeepSeek-R1 formats (1-4) working correctly - All streaming and content cleaning tests passing

Problem: - qwen3::extract_content_during_parsing() used aggressive regex to collapse multiple newlines - This broke proper code formatting (e.g., PEP 8's 2 empty lines between functions) - Affected non-tool-call streaming output where formatting is critical Solution: - Replace aggressive std::regex_replace(R"(\n\s*\n)", "\n") with gentle string_strip() - Follow original llama.cpp patterns: only trim leading/trailing whitespace - Preserve internal formatting including multiple newlines - Add proper include for common.h to access string_strip function Changes: - examples/server/parsers/qwen3_parser.hpp: Replace whitespace cleanup with string_strip() - tests/test-function-calls.cpp: Add test_qwen3_whitespace_preservation() to prevent regression Testing: - ✅ PEP 8 compliance: 2 empty lines between functions preserved - ✅ Tool call parsing: All Qwen3 tests continue to pass - ✅ No regressions: Existing functionality maintained - ✅ Follows original llama.cpp whitespace handling patterns

- Allow empty content when tool_calls are present (OpenAI spec compliance) - Return 400 Bad Request instead of 500 Server Error for missing content - Prevent server hangs on empty/whitespace-only content - Add proper content preprocessing and validation - Fix exception handling in chat completions endpoint This resolves issues where: 1. Missing content field returned 500 instead of 400 2. Empty content strings caused infinite server hangs 3. Single space content caused server timeouts 4. Assistant messages with empty content + tool_calls were rejected The server now properly follows OpenAI API behavior where assistant messages can have null/empty content when tool_calls are present.

Resolves issue where tool calls had empty IDs ("") instead of proper generated IDs like "call_1", violating OpenAI API specification. Root cause: Server double parsing bug where send_final_response() never called update_chat_msg() for non-streaming requests, causing: 1. Tool calls parsed correctly during streaming (with IDs) 2. Final response discarded parsed data and re-parsed raw text (losing IDs) Changes: - examples/server/server.cpp:1689: Added missing slot.update_chat_msg() call following original llama.cpp pattern - examples/server/server.cpp:2744: Modified format_final_response_oaicompat() to use provided tool calls instead of re-parsing - tests/test-function-calls.cpp: Added comprehensive unit test coverage - Fixed const-correctness by changing signature to match original Testing: - All 485 unit tests pass - Manual curl test confirms tool call IDs now "call_1" (was "") - Server logs show proper ID generation and preservation - No regressions in existing functionality

When the first message in a conversation has role 'assistant' instead of 'system', the tool injection logic was incorrectly using chat.push_back() which placed the system message with tools at the END of the conversation, making tools invisible to the LLM. Changes: - Fix utils.hpp:195 to use chat.insert(chat.begin(), system_msg) instead of chat.push_back(system_msg) following original llama.cpp add_system() pattern - Add comprehensive unit test test_kimi_k2_tool_injection_non_system_first() to verify correct behavior and prevent regression - Update existing tests to use proper 32-character alphanumeric tool call IDs This resolves the critical issue where: - Tools were detected but not appearing in final prompt - LLM was generating normal text instead of tool calls - format_chat logic was failing when conversation didn't start with system message All 435 unit tests passing.

Extends Kimi-K2 parser to handle the <function=FunctionName:id{json_args} format that was causing tool calls to be parsed as normal text. Changes: - Add parse_function_call_format() parser for <function= syntax - Integrate new parser with existing XML, token, and simple formats - Update content extraction to remove <function= calls from text - Add comprehensive unit tests covering single/multiple calls - Add detailed OpenAI tool calls array logging for client responses Fixes issue where model-generated function calls like: <function=Bash:0{"command": "git status", "description": "..."}> were not being recognized, resulting in: DEBUG: final_parse_result: 0 tool_calls (normal_text) Now properly parses these calls and returns tool_calls array. Tests: 47/47 new format assertions pass, all existing tests pass

- Add validation for OpenAI API conversation structure violations - Detect and reject invalid patterns (system + assistant without user) - Prevent infinite tool call loops caused by malformed conversations - Clean up excessive debug logging while preserving essential functionality - Maintain AnythingLLM format parsing support Fixes issues where malformed conversation structure causes LLM to get stuck generating tool call tokens instead of responding normally. Validates against OpenAI spec requiring proper conversation flow: - user → assistant OR system → user → assistant patterns - Prevents system → assistant pattern that causes infinite loops

Clean up excessive debug output for production readiness: - Remove tool injection debugging for Kimi-K2 and Qwen3 models - Keep essential functionality without verbose logging - Maintain clean server output for production use

gopinath87607 · 2025-08-02T10:59:36Z

will check this and update you

gopinath87607 · 2025-08-03T10:06:31Z

[299/300] Building CXX object examples...akeFiles/llama-server.dir/server.cpp.o
FAILED: examples/server/CMakeFiles/llama-server.dir/server.cpp.o
/usr/bin/c++ -DGGML_USE_CUDA -DSERVER_VERBOSE=1 -I/home/gopi/ik_llama.cpp/examples -I/home/gopi/ik_llama.cpp/examples/server -I/home/gopi/ik_llama.cpp/build/examples/server -I/home/gopi/ik_llama.cpp -I/home/gopi/ik_llama.cpp/common/. -I/home/gopi/ik_llama.cpp/src/. -I/home/gopi/ik_llama.cpp/src/../include -I/home/gopi/ik_llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -MD -MT examples/server/CMakeFiles/llama-server.dir/server.cpp.o -MF examples/server/CMakeFiles/llama-server.dir/server.cpp.o.d -o examples/server/CMakeFiles/llama-server.dir/server.cpp.o -c /home/gopi/ik_llama.cpp/examples/server/server.cpp
/home/gopi/ik_llama.cpp/examples/server/server.cpp: In member function ‘void server_context::send_final_response(server_slot&)’:
/home/gopi/ik_llama.cpp/examples/server/server.cpp:1773:23: error: expected primary-expression before ‘<<’ token
1773 | << " tool_calls, first_id='" << slot.current_msg.tool_calls[0].id << "'" << std::endl;
| ^~
ninja: build stopped: subcommand failed.
gopi@gopi-X99:~/ik_llama.cpp$

i am getting this error

here is what claude said

This is a C++ compilation error in the llama.cpp server code. The error is occurring on line 1773 of server.cpp where there's a syntax issue with stream operations.
The error message indicates:
error: expected primary-expression before '<<' token
This typically happens when there's a missing left-hand side operand for the stream insertion operator (<<). Looking at the problematic line:
cpp<< " tool_calls, first_id='" << slot.current_msg.tool_calls[0].id << "'" << std::endl;
The issue is that this line starts with <<, which means it should be continuing a stream operation from the previous line, but something is wrong with the syntax.
To fix this, you need to check the lines immediately before line 1773:

Missing stream object: The previous line might be missing a stream object like std::cout, std::cerr, or a logging macro
Missing semicolon: The previous line might be missing a semicolon, causing this line to be interpreted incorrectly
Broken line continuation: There might be a syntax issue in how the multi-line stream operation is structured

Common fixes:

Add the missing stream object (if the previous line doesn't have one):
cppstd::cout << " tool_calls, first_id='" << slot.current_msg.tool_calls[0].id << "'" << std::endl;

Or if it's part of a logging system, it might need something like:
cppLOG() << " tool_calls, first_id='" << slot.current_msg.tool_calls[0].id << "'" << std::endl;

Check the previous line to ensure it properly starts the stream operation.

To see the exact issue, you could look at lines 1770-1775 in the file:
bashsed -n '1770,1775p' /home/gopi/ik_llama.cpp/examples/server/server.cpp
This will show you the context around the error line so you can see what's missing.

- Added extensive debug output in common_chat_parse_anythingllm - Added tool call logging in server response handling - Added debug test case for DeepSeek R1 parser verification

ikawrakow · 2025-08-20T04:33:05Z

@iSevenDays This PR is still in draft. Do you intend to finish it?

iSevenDays added 30 commits July 17, 2025 09:16

Implement function calling / tools for ik_llama.cpp for Kimi K2

fb7d01f

Implement basic tool choice

7f54f55

Backport llama.cpp tool calls support

e9e7fe6

Fix duplicate common_chat_parse definition

d230096

- Remove duplicate implementation from chat-parser.cpp - Keep single implementation in chat.cpp following llama.cpp patterns - Resolves linker error: multiple definition of common_chat_parse

Fix JSON assertion failure in function call parsing

3eff579

- Add proper validation that 'function' field is an object before accessing nested keys - Handle missing 'arguments' field gracefully with default "{}" - Prevents crash when parsing malformed tool call JSON structures

Merge branch 'ikawrakow:main' into function_calling

cd0392f

Add partial parsing support for JSON and regex

0272064

- json-partial.h/cpp: JSON partial parsing functionality - regex-partial.h/cpp: Regex partial parsing functionality

Fix Qwen3 tool call parsing - pass model name to parser

ff6be37

Server was not passing model name to parse_chat_message_incremental(), causing Qwen3 to fall back to Kimi-K2 parser and return tool calls as content instead of proper tool_calls array.

Fix non-streaming path to use model-specific parsing

8726ae5

Non-streaming responses were hardcoded to use Kimi-K2 format, causing Qwen3 XML tool calls to be returned as content instead of proper tool_calls array. Now uses same model detection as streaming path for consistency.

Update Qwen3 function call handling in server and tests

aff9de3

- Enhanced server function call detection and response formatting - Improved test coverage for Qwen3 tool call scenarios - Refined XML parsing for better tool execution support

Merge origin/main into qwen3-function-calls branch

501bbe9

- Integrated latest upstream changes from ikawrakow/ik_llama.cpp - Resolved conflicts in test-function-calls.cpp - Maintained local enhancements for Qwen3 function calling support

Merge origin/main with DeepSeek-R1 implementation

a80811d

Resolved merge conflict in tests/test-function-calls.cpp by combining: - DeepSeek-R1 Format 4 XML wrapper tests - Streaming finish_reason logic tests from origin/main Both test suites now coexist and provide comprehensive coverage.

Fix merge conflict in test-function-calls.cpp

343304a

- Removed incomplete merge conflict marker from line 3027 - Ensured all tests compile and pass successfully - All DeepSeek-R1 formats (1-4) working correctly - All streaming and content cleaning tests passing

Merge tool-calls-bug-fix: Fix Qwen3 whitespace preservation

d90ac64

Merge deepseek-r1-parsing: Add DeepSeek R1 parsing functionality

3d3b335

Remove debug logging for tool injection checks

cf25a04

Clean up excessive debug output for production readiness: - Remove tool injection debugging for Kimi-K2 and Qwen3 models - Keep essential functionality without verbose logging - Maintain clean server output for production use

iSevenDays mentioned this pull request Aug 2, 2025

Fix Qwen3 content extraction breaking code formatting #661

Merged

4 tasks

Add debug logging for AnythingLLM tool call parsing

cd2484d

- Added extensive debug output in common_chat_parse_anythingllm - Added tool call logging in server response handling - Added debug test case for DeepSeek R1 parser verification

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT] Function call updates #670

[DRAFT] Function call updates #670

Uh oh!

iSevenDays commented Aug 2, 2025

Uh oh!

gopinath87607 commented Aug 2, 2025

Uh oh!

gopinath87607 commented Aug 3, 2025

Uh oh!

ikawrakow commented Aug 20, 2025

Uh oh!

Uh oh!

[DRAFT] Function call updates #670

Are you sure you want to change the base?

[DRAFT] Function call updates #670

Uh oh!

Conversation

iSevenDays commented Aug 2, 2025

Uh oh!

gopinath87607 commented Aug 2, 2025

Uh oh!

gopinath87607 commented Aug 3, 2025

Uh oh!

ikawrakow commented Aug 20, 2025

Uh oh!

Uh oh!