新增功能
- 新增在线权重更新支持开启Prefix Caching
- 新增GLM 4.5 Air模型部署支持
What's Changed
- [docs] update best practice docs for release/2.2 by @zoooo0820 in #3970
- [Docs] release 2.2.0 by @ming1753 in #3991
- [docs] update readme by @yangjianfengo1 in #3996
- [Optimize]Error messages about Model api. by @AuferGachet in #3972
- [Cherry-Pick] get org_vocab_size from args by @zeroRains in #3984
- 【FIX】Change the name of sparse attn from moba to plas by @yangjianfengo1 in #4006
- Fix down projection weight shape in fused MOE layer by @yuanlehome in #4041
- [Fix] fix multi api server log dir by @ltd0924 in #3966
- Fixed the issue of metrics file conflicts between multiple instances … by @zhuangzhuang12 in #4010
- [Feature] Support mixed deployment with yiyan adapter in release22 by @rainyfly in #3974
- [CI] update paddlepaddle==3.2.0 in release/2.2 by @EmmonsCurse in #3997
- [setup optimize]Support git submodule (#4033) by @YuanRisheng in #4080
- [CP]Glm45 air 2.2 by @ckl117 in #4073
- [feat] support prefix cache clearing when
/clear_load_weight
is called by @liyonghua0910 in #4091 - [BugFix]fix tp/ep group gid by @gzy19990617 in #4038
- Support limit thinking lengths. by @K11OntheBoat in #4070
- Add assertion for ENABLE_V1_KVCACHE_SCHEDULER by @Jiang-Jia-Jun in #4146
- [fix] fix ep group all-reduce by @liyonghua0910 in #4140
- [Cherry-pick] fix MTP load with v1 loader by @zoooo0820 in #4153
- [CP2.2] Machete support group scale & wint8 & v1 loader by @Sunny-bot1 in #4166
- [Feature] support rdma IB transfer by @ltd0924 in #4123
- [BugFix]2.2 glm all reduce tp group by @ckl117 in #4188
- [Executor] Adjust signal sending order in RL training (#3773) (#4066) by @gongshaotian in #4178
- [fix] initialize available_gpu_block_num with max_gpu_block_num by @liyonghua0910 in #4193
- [fix]Modify follow-up push parameters and Modify the verification method for thinking length by @luukunn in #4177
- Fix noaux_tc cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method by @ckl117 in #4115
- [Feature]CP support data clear by @ltd0924 in #4214
- [fix] fix clearing caches synchronization and add more logs by @liyonghua0910 in #4212
- fix ernie vl distributed attr. by @ZHUI in #4217
- [2.2]include_stop_str_in_output=False not return eos text by @ckl117 in #4231
- [fix]update apply_chat_template by @luukunn in #4249
- [fix]remove reasoning_max_tokens=max_toksns*0.8 in sampling_params by @luukunn in #4294
- 【fix】Remove the logic that assigns the default value of 80% to reasoning_max_tokens in the offline component of FastDeploy by @kxz2002 in #4304
- [feature]2.2 custom_allreduce support cudagraph recapture by @ckl117 in #4307
- [BUGFIX] clear request by @ltd0924 in #4320
Full Changelog: v2.2.0...v2.2.1