Skip to content

Conversation

zhangxinyuehfad
Copy link
Contributor

@zhangxinyuehfad zhangxinyuehfad commented Sep 30, 2025

What this PR does / why we need it?

fix ascend config for qwen3-next about #3291

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an issue with the Ascend configuration for qwen3-next models. The changes introduce a try-except block to safely retrieve the Ascend configuration, preventing a potential RuntimeError. It also updates how attention-related flags are passed. However, I've identified a critical issue where the logical relationship between use_sfa and use_mla appears to be broken by the changes, potentially leading to incorrect behavior. My review includes a comment with a suggested fix for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant