Skip to content

Add EXAONE 4.0 reasoning parser #22617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nuxlear
Copy link

@nuxlear nuxlear commented Aug 11, 2025

  • Add EXAONE 4.0 reasoning parser
  • Add request parameter for ReasoningParser.extract_reasoning_content_streaming()

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

EXAONE 4.0 currently uses reasoning_parser=deepseek_r1 (Please refer #21718).
However, it incorrectly treats all output as reasoning content instead of normal content when no <think> or </think> tags are found and enable_thinking=False.

While this was easily fixed by modifying the output of extract_reasoning_content(), the issue persists in streaming mode.
By adding a request parameter to extract_reasoning_content_streaming(), we can figure out whether the streamed token is reasoning content or normal content.

Test Plan

You can change the "stream" option for testing.

  • Run server
vllm serve LGAI-EXAONE/EXAONE-4.0.1-32B --tool-call-parser hermes --reasoning-parser exaone4
  1. Test normal request (all outputs should be content)
curl -X POST http://localhost:8850/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "LGAI-EXAONE/EXAONE-4.0.1-32B",
        "messages": [
            {"role": "user", "content": "Which is bigger, 3.7 or 3.11?"}
        ],
        "max_tokens": 1024,
        "stream": false
    }'
  1. Test reasoning request (should starts with reasoning_content)
curl -X POST http://localhost:8850/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "LGAI-EXAONE/EXAONE-4.0.1-32B",
        "messages": [
            {"role": "user", "content": "Which is bigger, 3.7 or 3.11?"}
        ],
        "chat_template_kwargs": {"enable_thinking": true},
        "max_tokens": 4096,
        "stream": false
    }'

Test Result

{"id":"chatcmpl-717caf53062e4b06ba51ad9e71c9512b","object":"chat.completion","created":1754885297,"model":"LGAI-EXAONE/EXAONE-4.0.1-32B","choices":[{"index":0,"message":{"role":"assistant","content":"To determine which number is bigger between **3.7** and **3.11**, follow these steps:\n\n1. **Compare the Whole Number Parts:**\n   - Both numbers have the same whole number part: **3**.\n\n2. **Compare the Decimal Parts:**\n   - **3.7** can be written as **3.70** to make the comparison easier.\n   - Now, compare **0.70** and **0.11**:\n     - **70** (from 0.70) is greater than **11** (from 0.11).\n\n3. **Conclusion:**\n   - Since **0.70 > 0.11**, it follows that **3.7 > 3.11**.\n\n\\[\n\\boxed{3.7}\n\\]","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":27,"total_tokens":219,"completion_tokens":192,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_transfer_params":null}
{"id":"chatcmpl-fbaf462004774a73a947f9ff99d1cfb3","object":"chat.completion","created":1754885833,"model":"LGAI-EXAONE/EXAONE-4.0.1-32B","choices":[{"index":0,"message":{"role":"assistant","content":"\n\nTo determine which is bigger between 3.7 and 3.11, compare the decimal values digit by digit, starting from the left.\n\n- Both numbers have the same whole number part (3).\n- In the tenths place, 3.7 has a 7, while 3.11 has a 1. Since 7 is greater than 1, 3.7 is larger at this point.\n- Even without further digits, the tenths place comparison is sufficient to conclude that 3.7 is bigger than 3.11.\n\nTo verify:\n- Rewrite 3.7 as 3.70 for easier comparison: 3.70 vs. 3.11.\n- Compare place values:\n  - Units: 3 = 3\n  - Tenths: 7 > 1\n  - Since the tenths differ, no need to check hundredths.\n- Alternatively, convert to fractions:\n  - 3.7 = 37/10 = 370/100\n  - 3.11 = 311/100\n  - Since 370/100 > 311/100, 3.7 is larger.\n\nSubtraction also confirms:  \n3.70 - 3.11 = 0.59, which is positive, so 3.7 is bigger.\n\nThus, **3.7 is bigger than 3.11**.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":"I need to determine which is bigger between 3.7 and 3.11. Both are decimals, so I should compare them digit by digit.\n\nFirst, look at the whole number part. Both have 3, so they're equal up to the units place. Now, I need to look at the tenths place.\n\nFor 3.7, the tenths digit is 7. For 3.11, the tenths digit is 1. Since 7 is greater than 1, 3.7 should be larger than 3.11.\n\nBut let me write them with the same number of decimal places to make it easier. 3.7 can be written as 3.70, and 3.11 is already 3.11. So, comparing 3.70 and 3.11:\n\n- Units: both 3\n\n- Tenths: 7 vs 1 → 7 > 1\n\nSince the tenths digit is higher for 3.7, it doesn't matter what comes after; 3.70 is greater than 3.11.\n\nI could also think in terms of fractions. 3.7 is 37/10, which is 370/100. And 3.11 is 311/100. Comparing 370/100 and 311/100, clearly 370 > 311, so 370/100 > 311/100, meaning 3.7 > 3.11.\n\n370/100 is 3.70, and 311/100 is 3.11, yes. So, 3.70 is indeed greater than 3.11.\n\nAnother way: 3.11 is three and eleven hundredths, while 3.7 is three and seven tenths. Seven tenths is seventy hundredths, and seventy hundredths is greater than eleven hundredths, so 3.7 > 3.11.\n\nI think I'm overcomplicating it. The simple comparison shows that since the tenths place is higher in 3.7, it's bigger.\n\nBut just to be thorough, let's consider if there's any trick here. Sometimes people might misread 3.7 as 3.07 or something, but no, 3.7 is clearly three point seven, which is 3.70.\n\nIn some contexts, decimals might be written differently, but standardly, 3.7 means 3.70.\n\nPerhaps the question is about numerical values only, so no tricks.\n\nSo, I think 3.7 is bigger than 3.11.\n\nBut let me confirm with subtraction: 3.7 - 3.11 = ? 3.7 minus 3.11.\n\nTo subtract, align decimals:\n\n  3.70\n\n- 3.11\n\n______\n\nStart from right: 0 - 1, can't do, borrow. 10 - 1 = 9, but since we borrowed, the 7 becomes 6 (because 70 becomes 69? Let's think carefully.\n\nActually, 3.70 minus 3.11:\n\nHundredths place: 0 < 1, so need to borrow from tenths. But tenths place has 7, which is 70 hundredths. So, take one from tenths, so tenths become 6, and hundredths become 10. Then 10 - 1 = 9.\n\nNow tenths place: 6 (after borrowing) minus 1 = 5.\n\nUnits place: 3 - 3 = 0.\n\nSo, 0.59, which is positive, meaning 3.70 > 3.11.\n\nYes, difference is 0.59, so 3.7 is bigger.\n\nIf I think on a number line, 3.11 is to the left of 3.7, so smaller.\n\nTherefore, 3.7 is bigger.\n\nThe question is \"which is bigger, 3.7 or 3.11?\" So, answer should be 3.7.\n\nBut just to be complete, is there any context where 3.11 could be larger? I don't think so. Unless it's a different base or something, but it's standard decimal.\n\nPerhaps someone might confuse it with fractions, but 3.7 is 37/10 = 3.7, 3.11 is 311/100 = 3.11, and 37/10 = 370/100 > 311/100.\n\n370/100 vs 311/100, yes.\n\nOr as mixed numbers: 3 7/10 vs 3 11/100. 7/10 = 70/100 > 11/100, so 3 70/100 > 3 11/100.\n\nAll ways confirm.\n\nSo, I think it's clear.\n"},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":23,"total_tokens":1572,"completion_tokens":1549,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_transfer_params":null}

(Optional) Documentation Update

- Add `request` parameter for ReasoningParser.extract_reasoning_content_streaming()

Co-authored-by: Junwon Hwang <[email protected]>
Co-authored-by: heyzude <[email protected]>
@nuxlear nuxlear requested a review from aarnphm as a code owner August 11, 2025 04:31
@mergify mergify bot added deepseek Related to DeepSeek models frontend qwen Related to Qwen models labels Aug 11, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new reasoning parser for the EXAONE 4.0 model and refactors the ReasoningParser interface to support it. The core change is the addition of the request parameter to extract_reasoning_content_streaming to allow parsers to access request-specific information, like enable_thinking. While the changes are generally well-implemented and include comprehensive tests, I've found a critical issue in the non-streaming implementation of the new parser where it fails to use this new request parameter, leading to behavior that is inconsistent with its streaming counterpart and incorrect under certain conditions.

Comment on lines +153 to +156
if self.end_token not in model_output:
if model_output_parts[1]:
return model_output, None
return None, model_output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The logic for handling model output without an end token in non-streaming mode is inconsistent with the streaming implementation and doesn't correctly handle the enable_thinking flag. When enable_thinking is true, the output should be treated as reasoning content even if <think> and </think> tags are missing. The current implementation only checks for the presence of the <think> tag, which can lead to incorrect parsing of the model's output.

You should use the request object to check chat_template_kwargs.get("enable_thinking"), similar to how it's done in extract_reasoning_content_streaming, to ensure consistent behavior.

Suggested change
if self.end_token not in model_output:
if model_output_parts[1]:
return model_output, None
return None, model_output
if self.end_token not in model_output:
enable_thinking = (request is not None and
request.chat_template_kwargs is not None and
request.chat_template_kwargs.get("enable_thinking"))
if enable_thinking or model_output_parts[1]:
return model_output, None
return None, model_output

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deepseek Related to DeepSeek models frontend qwen Related to Qwen models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant