-
Notifications
You must be signed in to change notification settings - Fork 218
Mistral tool calling unary #3567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -0,0 +1,93 @@ | |||
{%- if messages[0]["role"] == "system" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't want to maintain this template here. Check this PR #3565
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed: now using vllm's mistral_parallel.jinja which gives worse accuracy
} else { | ||
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Tool calls are not generated by the model"); | ||
} | ||
} else { | ||
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Tool calls are not generated by the model"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have isToolGenerated, maybe we could drop these elses
and go with
if (!isToolGenerated) {
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Tool calls are not generated by the model");
}
outside of this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed anymore
if (begin != parsedOutput.content.end() && *begin == '[') { | ||
begin = skipToFirstNonWhitespaceCharacter(begin + 1, parsedOutput.content.end()); | ||
if (begin != parsedOutput.content.end() && *begin == '{') { | ||
// If the content starts with '[{', it indicates that tool calls are present. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you make sure Mistral does not generate any BOT token that we might not see after decoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BOT tokens appear when we detokenize with skip_special_tokens=false. changed parser to base on that
continue; | ||
} | ||
ToolCall toolCall; | ||
toolCall.id = generateRandomId(); // Generate a random ID for the tool call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can have it at the end before pushing to toolCalls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
rapidjson::Document toolsDoc; | ||
toolsDoc.Parse(toolsString.c_str()); | ||
|
||
if (!toolsDoc.HasParseError() && toolsDoc.IsArray() && !hasMoreSpecialTags) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is going to happen when json is incomplete for example with max_tokens too short? I guess we should still return the tool_call response even if incomplete or invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as with other tool parsers - when max tokens is reached and message is not valid JSON, it will not be put into tools, but content
@@ -52,7 +52,7 @@ def add_common_arguments(parser): | |||
'Not effective if target device is not NPU', dest='max_prompt_len') | |||
parser_text.add_argument('--prompt_lookup_decoding', action='store_true', help='Set pipeline to use prompt lookup decoding', dest='prompt_lookup_decoding') | |||
parser_text.add_argument('--reasoning_parser', choices=["qwen3"], help='Set the type of the reasoning parser for reasoning content extraction', dest='reasoning_parser') | |||
parser_text.add_argument('--tool_parser', choices=["llama3","phi4","hermes3", "qwen3"], help='Set the type of the tool parser for tool calls extraction', dest='tool_parser') | |||
parser_text.add_argument('--tool_parser', choices=["llama3","phi4","hermes3", "qwen3","mistral"], help='Set the type of the tool parser for tool calls extraction', dest='tool_parser') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated, but qwen3 should not be here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is already being changed in your PR, right? im rebased
return; | ||
} | ||
|
||
std::string decoded = tokenizer.decode(generatedTokens, {ov::genai::skip_special_tokens(false)}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that line we do double decoding since parsedOutput already contains decoded content. I suppose it's the skip_special_tokens
option that is the reason of doing it again. Did you check performance impact? Can we afford that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
theres no double decoding anymore, i simply check first token for BOT and treat entire response as JSON as you requested, and it works
const std::string toolsStartEnd = getParsingEndTag(); | ||
|
||
size_t toolsStartPos = decoded.find(toolsStartString); | ||
size_t toolsEndPos = decoded.find(toolsStartEnd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do mistral have a dedicated tools end tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apparently no - the we saw was EOS
std::string decoded = tokenizer.decode(generatedTokens, {ov::genai::skip_special_tokens(false)}); | ||
|
||
const std::string toolsStartString = getParsingStartTag(); | ||
const std::string toolsStartEnd = getParsingEndTag(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StartEnd?
std::string remaining = decoded.substr(0, toolsStartPos) + decoded.substr(toolsEndPos + toolsStartEnd.length()); | ||
|
||
size_t toolsStartPos2 = remaining.find(toolsStartString); | ||
size_t toolsEndPos2 = remaining.find(toolsStartEnd); | ||
bool hasMoreSpecialTags = !(toolsStartPos2 == std::string::npos && toolsEndPos2 == std::string::npos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to make such extraction? Did you see real example where there are more special tokens in the generated output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not relevant anymore with new approach
namespace ovms { | ||
class MistralToolParser : public BaseOutputParser { | ||
const std::string toolCallStartTag = "[TOOL_CALLS]"; | ||
const std::string toolCallEndTag = "</s>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it just a regular EOS token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, good catch
// Tools calls are expected to be the last part of the content, so we do not specify an end tag. | ||
const std::string& getParsingEndTag() const override { | ||
return toolCallEndTag; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation does not match the comment. If the comment is true for mistral, we should return empty string here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
std::unique_ptr<OutputParser> outputParser; | ||
|
||
void SetUp() override { | ||
// For Phi4 model there is only tool parser available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phi4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
EXPECT_EQ(parsedOutput.toolCalls[0].arguments, "{\"arg1\":\"value1\",\"arg2\":42}"); | ||
EXPECT_EQ(parsedOutput.toolCalls[0].id.empty(), false); // ID should be generated | ||
} | ||
TEST_F(MistralOutputParserTest, ParseToolCallOutputWithContentOnBothSidesAndSingleToolCall) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a real example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not? model can produce anything
EXPECT_EQ(parsedOutput.toolCalls[0].arguments, "{\"arg1\":\"value1\",\"arg2\":42}"); | ||
EXPECT_EQ(parsedOutput.toolCalls[0].id.empty(), false); // ID should be generated | ||
} | ||
TEST_F(MistralOutputParserTest, ParseToolCallOutputWithMultipleToolCallsReturnsContentOnly) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a real example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not? model can produce anything, why should we unit test that
🛠 Summary
CVS-171565