Skip to content

Conversation

iSevenDays
Copy link
Contributor

- Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling
- Improve function calls parsing with fallback to llama.cpp builder pattern
- Add string utility functions (starts_with, ends_with, find_partial_stop)
- Update README with function calls testing instructions
- Enhance Kimi K2 parser and function calls documentation
- Add comprehensive test suite for function calls
- Update CMakeLists.txt and Makefile for new components
- Fix streaming content cleanup to prevent function syntax in output
- Unify content extraction patterns with llama.cpp approach
- Improve Kimi K2 parser robustness and partial content handling
- Add comprehensive test coverage for function call scenarios
- Optimize chat message parsing and diff computation
- Add compile-time constants for all token format markers
- Add compile-time constants for XML format markers
- Add compile-time constants for simple format patterns
- Replace all hardcoded string literals with named constants
- Use compile-time length calculation to avoid manual counting
- Improve maintainability and reduce magic numbers throughout parser
- Remove duplicate implementation from chat-parser.cpp
- Keep single implementation in chat.cpp following llama.cpp patterns
- Resolves linker error: multiple definition of common_chat_parse
- Add proper validation that 'function' field is an object before accessing nested keys
- Handle missing 'arguments' field gracefully with default "{}"
- Prevents crash when parsing malformed tool call JSON structures
- Implement Qwen3 XML parser with <tool_call>{"name": "func", "arguments": {...}}</tool_call> format
- Add model detection and routing for Qwen3 vs Kimi-K2 formats
- Create 8 comprehensive unit tests covering parsing, streaming, error handling
- Fix token format cleaning bug in kimi_k2_parser.hpp processing order
- Remove progressive parsing code and related utilities
- Add tool injection support for Qwen3 format in server utils
- Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp
- Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp
- Update function_calls.hpp with DeepSeek R1 integration and content extraction
- Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models
- Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration
- Port exact implementation patterns from original llama.cpp for compatibility

Key features:
- Native DeepSeek R1 format: <|tool▁calls▁begin|>function<|tool▁sep|>name```json{}```<|tool▁call▁end|><|tool▁calls▁end|>
- Reasoning content extraction from <think>...</think> tags
- Multiple tool calls support with separate call blocks
- Model detection for deepseek-r1, deepseek_r1 naming patterns
- Integration with incremental parsing and streaming support
- json-partial.h/cpp: JSON partial parsing functionality
- regex-partial.h/cpp: Regex partial parsing functionality
- Add test_qwen3_format_chat_integration() to validate tool injection pipeline
- Test tool injection conditions and system message enhancement
- Verify JSON formatting and anti-preamble instructions
- Add comprehensive test documentation

Tests confirm tool injection works correctly - conversational preamble
issue is not in ik_llama.cpp but likely in UI configuration.
Server was not passing model name to parse_chat_message_incremental(),
causing Qwen3 to fall back to Kimi-K2 parser and return tool calls
as content instead of proper tool_calls array.
Non-streaming responses were hardcoded to use Kimi-K2 format,
causing Qwen3 XML tool calls to be returned as content instead
of proper tool_calls array. Now uses same model detection as
streaming path for consistency.
- Enhanced server function call detection and response formatting
- Improved test coverage for Qwen3 tool call scenarios
- Refined XML parsing for better tool execution support
- Integrated latest upstream changes from ikawrakow/ik_llama.cpp
- Resolved conflicts in test-function-calls.cpp
- Maintained local enhancements for Qwen3 function calling support
Implements comprehensive parsing for all 4 DeepSeek-R1 function call formats:
- Format 1: Standard function call syntax (already supported)
- Format 2: Alternative function call patterns (already supported)
- Format 3: Tools array format - function\n```json\n{"tools": [...]}
- Format 4: XML wrapped format - <tool_call>function</think>Name\n```json\n{...}```</tool_call>

Key changes:
- Added parse_deepseek_r1_tools_array() following original parse_prefixed_json_tool_call_array pattern
- Added parse_deepseek_r1_xml_wrapped() following Hermes-2-Pro XML wrapper patterns
- Integrated both parsers into exception handling chain for robust fallback
- Added comprehensive TDD test coverage for all formats
- Anonymized all confidential information while preserving functionality

Resolves tool_calls_count=0 issue where DeepSeek-R1 models generated valid tool calls
but server failed to parse them correctly.
- Added Format 4 (XML wrapped) documentation with examples
- Updated implementation notes with correct parser order (3→4→1→2)
- Marked all DeepSeek-R1 formats as working (July 2025 update)
- Updated test status for Format 3 and 4 as passing
- Added parse_deepseek_r1_xml_wrapped() function reference
- Corrected implementation file line numbers
Resolved merge conflict in tests/test-function-calls.cpp by combining:
- DeepSeek-R1 Format 4 XML wrapper tests
- Streaming finish_reason logic tests from origin/main

Both test suites now coexist and provide comprehensive coverage.
- Removed incomplete merge conflict marker from line 3027
- Ensured all tests compile and pass successfully
- All DeepSeek-R1 formats (1-4) working correctly
- All streaming and content cleaning tests passing
Problem:
- qwen3::extract_content_during_parsing() used aggressive regex to collapse multiple newlines
- This broke proper code formatting (e.g., PEP 8's 2 empty lines between functions)
- Affected non-tool-call streaming output where formatting is critical

Solution:
- Replace aggressive std::regex_replace(R"(\n\s*\n)", "\n") with gentle string_strip()
- Follow original llama.cpp patterns: only trim leading/trailing whitespace
- Preserve internal formatting including multiple newlines
- Add proper include for common.h to access string_strip function

Changes:
- examples/server/parsers/qwen3_parser.hpp: Replace whitespace cleanup with string_strip()
- tests/test-function-calls.cpp: Add test_qwen3_whitespace_preservation() to prevent regression

Testing:
- ✅ PEP 8 compliance: 2 empty lines between functions preserved
- ✅ Tool call parsing: All Qwen3 tests continue to pass
- ✅ No regressions: Existing functionality maintained
- ✅ Follows original llama.cpp whitespace handling patterns
- Allow empty content when tool_calls are present (OpenAI spec compliance)
- Return 400 Bad Request instead of 500 Server Error for missing content
- Prevent server hangs on empty/whitespace-only content
- Add proper content preprocessing and validation
- Fix exception handling in chat completions endpoint

This resolves issues where:
1. Missing content field returned 500 instead of 400
2. Empty content strings caused infinite server hangs
3. Single space content caused server timeouts
4. Assistant messages with empty content + tool_calls were rejected

The server now properly follows OpenAI API behavior where assistant
messages can have null/empty content when tool_calls are present.
Resolves issue where tool calls had empty IDs ("") instead of proper
generated IDs like "call_1", violating OpenAI API specification.

Root cause: Server double parsing bug where send_final_response() never
called update_chat_msg() for non-streaming requests, causing:
1. Tool calls parsed correctly during streaming (with IDs)
2. Final response discarded parsed data and re-parsed raw text (losing IDs)

Changes:
- examples/server/server.cpp:1689: Added missing slot.update_chat_msg() call
  following original llama.cpp pattern
- examples/server/server.cpp:2744: Modified format_final_response_oaicompat()
  to use provided tool calls instead of re-parsing
- tests/test-function-calls.cpp: Added comprehensive unit test coverage
- Fixed const-correctness by changing signature to match original

Testing:
- All 485 unit tests pass
- Manual curl test confirms tool call IDs now "call_1" (was "")
- Server logs show proper ID generation and preservation
- No regressions in existing functionality
When the first message in a conversation has role 'assistant' instead of 'system',
the tool injection logic was incorrectly using chat.push_back() which placed the
system message with tools at the END of the conversation, making tools invisible
to the LLM.

Changes:
- Fix utils.hpp:195 to use chat.insert(chat.begin(), system_msg) instead of
  chat.push_back(system_msg) following original llama.cpp add_system() pattern
- Add comprehensive unit test test_kimi_k2_tool_injection_non_system_first() to
  verify correct behavior and prevent regression
- Update existing tests to use proper 32-character alphanumeric tool call IDs

This resolves the critical issue where:
- Tools were detected but not appearing in final prompt
- LLM was generating normal text instead of tool calls
- format_chat logic was failing when conversation didn't start with system message

All 435 unit tests passing.
Extends Kimi-K2 parser to handle the <function=FunctionName:id{json_args}
format that was causing tool calls to be parsed as normal text.

Changes:
- Add parse_function_call_format() parser for <function= syntax
- Integrate new parser with existing XML, token, and simple formats
- Update content extraction to remove <function= calls from text
- Add comprehensive unit tests covering single/multiple calls
- Add detailed OpenAI tool calls array logging for client responses

Fixes issue where model-generated function calls like:
<function=Bash:0{"command": "git status", "description": "..."}>
were not being recognized, resulting in:
DEBUG: final_parse_result: 0 tool_calls (normal_text)

Now properly parses these calls and returns tool_calls array.

Tests: 47/47 new format assertions pass, all existing tests pass
- Add validation for OpenAI API conversation structure violations
- Detect and reject invalid patterns (system + assistant without user)
- Prevent infinite tool call loops caused by malformed conversations
- Clean up excessive debug logging while preserving essential functionality
- Maintain AnythingLLM format parsing support

Fixes issues where malformed conversation structure causes LLM to get
stuck generating tool call tokens instead of responding normally.

Validates against OpenAI spec requiring proper conversation flow:
- user → assistant OR system → user → assistant patterns
- Prevents system → assistant pattern that causes infinite loops
Clean up excessive debug output for production readiness:
- Remove tool injection debugging for Kimi-K2 and Qwen3 models
- Keep essential functionality without verbose logging
- Maintain clean server output for production use
@gopinath87607
Copy link

will check this and update you

@gopinath87607
Copy link

[299/300] Building CXX object examples...akeFiles/llama-server.dir/server.cpp.o
FAILED: examples/server/CMakeFiles/llama-server.dir/server.cpp.o
/usr/bin/c++ -DGGML_USE_CUDA -DSERVER_VERBOSE=1 -I/home/gopi/ik_llama.cpp/examples -I/home/gopi/ik_llama.cpp/examples/server -I/home/gopi/ik_llama.cpp/build/examples/server -I/home/gopi/ik_llama.cpp -I/home/gopi/ik_llama.cpp/common/. -I/home/gopi/ik_llama.cpp/src/. -I/home/gopi/ik_llama.cpp/src/../include -I/home/gopi/ik_llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -MD -MT examples/server/CMakeFiles/llama-server.dir/server.cpp.o -MF examples/server/CMakeFiles/llama-server.dir/server.cpp.o.d -o examples/server/CMakeFiles/llama-server.dir/server.cpp.o -c /home/gopi/ik_llama.cpp/examples/server/server.cpp
/home/gopi/ik_llama.cpp/examples/server/server.cpp: In member function ‘void server_context::send_final_response(server_slot&)’:
/home/gopi/ik_llama.cpp/examples/server/server.cpp:1773:23: error: expected primary-expression before ‘<<’ token
1773 | << " tool_calls, first_id='" << slot.current_msg.tool_calls[0].id << "'" << std::endl;
| ^~
ninja: build stopped: subcommand failed.
gopi@gopi-X99:~/ik_llama.cpp$

i am getting this error

here is what claude said

This is a C++ compilation error in the llama.cpp server code. The error is occurring on line 1773 of server.cpp where there's a syntax issue with stream operations.
The error message indicates:
error: expected primary-expression before '<<' token
This typically happens when there's a missing left-hand side operand for the stream insertion operator (<<). Looking at the problematic line:
cpp<< " tool_calls, first_id='" << slot.current_msg.tool_calls[0].id << "'" << std::endl;
The issue is that this line starts with <<, which means it should be continuing a stream operation from the previous line, but something is wrong with the syntax.
To fix this, you need to check the lines immediately before line 1773:

Missing stream object: The previous line might be missing a stream object like std::cout, std::cerr, or a logging macro
Missing semicolon: The previous line might be missing a semicolon, causing this line to be interpreted incorrectly
Broken line continuation: There might be a syntax issue in how the multi-line stream operation is structured

Common fixes:

Add the missing stream object (if the previous line doesn't have one):
cppstd::cout << " tool_calls, first_id='" << slot.current_msg.tool_calls[0].id << "'" << std::endl;

Or if it's part of a logging system, it might need something like:
cppLOG() << " tool_calls, first_id='" << slot.current_msg.tool_calls[0].id << "'" << std::endl;

Check the previous line to ensure it properly starts the stream operation.

To see the exact issue, you could look at lines 1770-1775 in the file:
bashsed -n '1770,1775p' /home/gopi/ik_llama.cpp/examples/server/server.cpp
This will show you the context around the error line so you can see what's missing.

- Added extensive debug output in common_chat_parse_anythingllm
- Added tool call logging in server response handling
- Added debug test case for DeepSeek R1 parser verification
@ikawrakow
Copy link
Owner

@iSevenDays This PR is still in draft. Do you intend to finish it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants