Skip to content

Conversation

dkalinowski
Copy link
Collaborator

@dkalinowski dkalinowski commented Aug 11, 2025

🛠 Summary

CVS-171565

{'multiple'}
{"accuracy": 0.89, "correct_count": 178, "total_count": 200}

@dkalinowski dkalinowski requested review from mzegla and dtrawins and removed request for mzegla August 11, 2025 12:58
@dkalinowski dkalinowski changed the title [WIP] Mistral tool calling unary Mistral tool calling unary Aug 12, 2025
@@ -0,0 +1,93 @@
{%- if messages[0]["role"] == "system" %}
Copy link
Collaborator

@dtrawins dtrawins Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want to maintain this template here. Check this PR #3565

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed: now using vllm's mistral_parallel.jinja which gives worse accuracy

Comment on lines 61 to 66
} else {
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Tool calls are not generated by the model");
}
} else {
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Tool calls are not generated by the model");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have isToolGenerated, maybe we could drop these elses and go with

if (!isToolGenerated) {
	SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Tool calls are not generated by the model");
}

outside of this block?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed anymore

if (begin != parsedOutput.content.end() && *begin == '[') {
begin = skipToFirstNonWhitespaceCharacter(begin + 1, parsedOutput.content.end());
if (begin != parsedOutput.content.end() && *begin == '{') {
// If the content starts with '[{', it indicates that tool calls are present.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you make sure Mistral does not generate any BOT token that we might not see after decoding?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOT tokens appear when we detokenize with skip_special_tokens=false. changed parser to base on that

continue;
}
ToolCall toolCall;
toolCall.id = generateRandomId(); // Generate a random ID for the tool call
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can have it at the end before pushing to toolCalls

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dkalinowski dkalinowski requested review from dtrawins and mzegla August 18, 2025 11:08
rapidjson::Document toolsDoc;
toolsDoc.Parse(toolsString.c_str());

if (!toolsDoc.HasParseError() && toolsDoc.IsArray() && !hasMoreSpecialTags) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is going to happen when json is incomplete for example with max_tokens too short? I guess we should still return the tool_call response even if incomplete or invalid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as with other tool parsers - when max tokens is reached and message is not valid JSON, it will not be put into tools, but content

@@ -52,7 +52,7 @@ def add_common_arguments(parser):
'Not effective if target device is not NPU', dest='max_prompt_len')
parser_text.add_argument('--prompt_lookup_decoding', action='store_true', help='Set pipeline to use prompt lookup decoding', dest='prompt_lookup_decoding')
parser_text.add_argument('--reasoning_parser', choices=["qwen3"], help='Set the type of the reasoning parser for reasoning content extraction', dest='reasoning_parser')
parser_text.add_argument('--tool_parser', choices=["llama3","phi4","hermes3", "qwen3"], help='Set the type of the tool parser for tool calls extraction', dest='tool_parser')
parser_text.add_argument('--tool_parser', choices=["llama3","phi4","hermes3", "qwen3","mistral"], help='Set the type of the tool parser for tool calls extraction', dest='tool_parser')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated, but qwen3 should not be here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already being changed in your PR, right? im rebased

return;
}

std::string decoded = tokenizer.decode(generatedTokens, {ov::genai::skip_special_tokens(false)});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that line we do double decoding since parsedOutput already contains decoded content. I suppose it's the skip_special_tokens option that is the reason of doing it again. Did you check performance impact? Can we afford that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theres no double decoding anymore, i simply check first token for BOT and treat entire response as JSON as you requested, and it works

const std::string toolsStartEnd = getParsingEndTag();

size_t toolsStartPos = decoded.find(toolsStartString);
size_t toolsEndPos = decoded.find(toolsStartEnd);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do mistral have a dedicated tools end tag?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently no - the we saw was EOS

std::string decoded = tokenizer.decode(generatedTokens, {ov::genai::skip_special_tokens(false)});

const std::string toolsStartString = getParsingStartTag();
const std::string toolsStartEnd = getParsingEndTag();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StartEnd?

Comment on lines 52 to 56
std::string remaining = decoded.substr(0, toolsStartPos) + decoded.substr(toolsEndPos + toolsStartEnd.length());

size_t toolsStartPos2 = remaining.find(toolsStartString);
size_t toolsEndPos2 = remaining.find(toolsStartEnd);
bool hasMoreSpecialTags = !(toolsStartPos2 == std::string::npos && toolsEndPos2 == std::string::npos);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to make such extraction? Did you see real example where there are more special tokens in the generated output?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not relevant anymore with new approach

namespace ovms {
class MistralToolParser : public BaseOutputParser {
const std::string toolCallStartTag = "[TOOL_CALLS]";
const std::string toolCallEndTag = "</s>";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it just a regular EOS token?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good catch

Comment on lines 47 to 50
// Tools calls are expected to be the last part of the content, so we do not specify an end tag.
const std::string& getParsingEndTag() const override {
return toolCallEndTag;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation does not match the comment. If the comment is true for mistral, we should return empty string here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

std::unique_ptr<OutputParser> outputParser;

void SetUp() override {
// For Phi4 model there is only tool parser available
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phi4?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

EXPECT_EQ(parsedOutput.toolCalls[0].arguments, "{\"arg1\":\"value1\",\"arg2\":42}");
EXPECT_EQ(parsedOutput.toolCalls[0].id.empty(), false); // ID should be generated
}
TEST_F(MistralOutputParserTest, ParseToolCallOutputWithContentOnBothSidesAndSingleToolCall) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a real example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not? model can produce anything

EXPECT_EQ(parsedOutput.toolCalls[0].arguments, "{\"arg1\":\"value1\",\"arg2\":42}");
EXPECT_EQ(parsedOutput.toolCalls[0].id.empty(), false); // ID should be generated
}
TEST_F(MistralOutputParserTest, ParseToolCallOutputWithMultipleToolCallsReturnsContentOnly) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a real example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not? model can produce anything, why should we unit test that

@dkalinowski dkalinowski merged commit 17136c3 into main Aug 20, 2025
14 checks passed
@dkalinowski dkalinowski deleted the mistral branch August 20, 2025 07:54
przepeck pushed a commit that referenced this pull request Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants