Skip to content

v0.11.0rc3

Pre-release
Pre-release

Choose a tag to compare

@wangxiyuan wangxiyuan released this 03 Dec 03:54
· 15 commits to v0.11.0-dev since this release
b6d63bb

This is the third release candidate of v0.11.0 for vLLM Ascend. For quality reasons, we released a new rc before the official release. Thanks for all your feedback. Please follow the official doc to get started.

Highlights

  • torch-npu is upgraded to 2.7.1.post1. Please note that the package is pushed to pypi mirror. So it's hard to add it to auto dependence. Please install it by yourself.
  • Disable NZ weight loader to speed up dense model. Please note that this is a temporary solution. If you find the performance becomes bad, please let us know. We'll keep improving it. #4495
  • mooncake is installed in official docker image now. You can use it directly in container now. #4506

Other

  • Fix an OOM issue for moe models. #4367
  • Fix hang issue of multimodal model when running with DP>1 #4393
  • Fix some bugs for EPLB #4416
  • Fix bug for mtp>1 + lm_head_tp>1 case #4360
  • Fix a accuracy issue when running vLLM serve for long time. #4117
  • Fix a function bug when running qwen2.5 vl under high concurrency. #4553

Full Changelog: v0.11.0rc2...v0.11.0rc3