-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
Add EXAONE 4.0 reasoning parser #22617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add `request` parameter for ReasoningParser.extract_reasoning_content_streaming() Co-authored-by: Junwon Hwang <[email protected]> Co-authored-by: heyzude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new reasoning parser for the EXAONE 4.0 model and refactors the ReasoningParser
interface to support it. The core change is the addition of the request
parameter to extract_reasoning_content_streaming
to allow parsers to access request-specific information, like enable_thinking
. While the changes are generally well-implemented and include comprehensive tests, I've found a critical issue in the non-streaming implementation of the new parser where it fails to use this new request
parameter, leading to behavior that is inconsistent with its streaming counterpart and incorrect under certain conditions.
if self.end_token not in model_output: | ||
if model_output_parts[1]: | ||
return model_output, None | ||
return None, model_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for handling model output without an end token in non-streaming mode is inconsistent with the streaming implementation and doesn't correctly handle the enable_thinking
flag. When enable_thinking
is true, the output should be treated as reasoning content even if <think>
and </think>
tags are missing. The current implementation only checks for the presence of the <think>
tag, which can lead to incorrect parsing of the model's output.
You should use the request
object to check chat_template_kwargs.get("enable_thinking")
, similar to how it's done in extract_reasoning_content_streaming
, to ensure consistent behavior.
if self.end_token not in model_output: | |
if model_output_parts[1]: | |
return model_output, None | |
return None, model_output | |
if self.end_token not in model_output: | |
enable_thinking = (request is not None and | |
request.chat_template_kwargs is not None and | |
request.chat_template_kwargs.get("enable_thinking")) | |
if enable_thinking or model_output_parts[1]: | |
return model_output, None | |
return None, model_output |
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
request
parameter forReasoningParser.extract_reasoning_content_streaming()
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
EXAONE 4.0 currently uses
reasoning_parser=deepseek_r1
(Please refer #21718).However, it incorrectly treats all output as reasoning content instead of normal content when no
<think>
or</think>
tags are found andenable_thinking=False
.While this was easily fixed by modifying the output of
extract_reasoning_content()
, the issue persists in streaming mode.By adding a
request
parameter toextract_reasoning_content_streaming()
, we can figure out whether the streamed token is reasoning content or normal content.Test Plan
You can change the
"stream"
option for testing.content
)reasoning_content
)Test Result
(Optional) Documentation Update