What's Changed
- Support Single Node P-D Disaggregate function for CUDA.
- Support Qwen3 Model, Currently only for Dense, MoE Models support is WIP.
- Support EP in MOE OP, currently only support BF16 and FP16.
- CPU Support is not work in this release, Use a v2.1.x version for CPU support.
In Detail
- readme: add citation, update subproject description by @leefige in #70
- fix cugraph dnn moe kernel bug by @laiwenzh in #71
- support MOE EP by @yjc9696 in #73
- support disaggregated prefilling by @laiwenzh in #78
- prompt相关的问题修复 by @lddfym in #77
- Build: only build flash attention kernel once. by @kzjeef in #80
- support qwenV3【Dense】 by @yjc9696 in #81
- model: fix cpu compiler error by @kzjeef in #82
- Fix cpu compile by @kzjeef in #83
- Create build-check.yml by @kzjeef in #84
- ci: only trigger cuda release for current version. by @kzjeef in #85
New Contributors
Full Changelog: v2.1.0...v3.0.0-rc1
Existing Issue
- CPU Support is not work in this release, Use a v2.1.x version for CPU support.