Release v2.2.1 · PaddlePaddle/FastDeploy

新增功能

新增在线权重更新支持开启Prefix Caching
新增GLM 4.5 Air模型部署支持

What's Changed

[docs] update best practice docs for release/2.2 by @zoooo0820 in #3970
[Docs] release 2.2.0 by @ming1753 in #3991
[docs] update readme by @yangjianfengo1 in #3996
[Optimize]Error messages about Model api. by @AuferGachet in #3972
[Cherry-Pick] get org_vocab_size from args by @zeroRains in #3984
【FIX】Change the name of sparse attn from moba to plas by @yangjianfengo1 in #4006
Fix down projection weight shape in fused MOE layer by @yuanlehome in #4041
[Fix] fix multi api server log dir by @ltd0924 in #3966
Fixed the issue of metrics file conflicts between multiple instances … by @zhuangzhuang12 in #4010
[Feature] Support mixed deployment with yiyan adapter in release22 by @rainyfly in #3974
[CI] update paddlepaddle==3.2.0 in release/2.2 by @EmmonsCurse in #3997
[setup optimize]Support git submodule (#4033) by @YuanRisheng in #4080
[CP]Glm45 air 2.2 by @ckl117 in #4073
[feat] support prefix cache clearing when /clear_load_weight is called by @liyonghua0910 in #4091
[BugFix]fix tp/ep group gid by @gzy19990617 in #4038
Support limit thinking lengths. by @K11OntheBoat in #4070
Add assertion for ENABLE_V1_KVCACHE_SCHEDULER by @Jiang-Jia-Jun in #4146
[fix] fix ep group all-reduce by @liyonghua0910 in #4140
[Cherry-pick] fix MTP load with v1 loader by @zoooo0820 in #4153
[CP2.2] Machete support group scale & wint8 & v1 loader by @Sunny-bot1 in #4166
[Feature] support rdma IB transfer by @ltd0924 in #4123
[BugFix]2.2 glm all reduce tp group by @ckl117 in #4188
[Executor] Adjust signal sending order in RL training (#3773) (#4066) by @gongshaotian in #4178
[fix] initialize available_gpu_block_num with max_gpu_block_num by @liyonghua0910 in #4193
[fix]Modify follow-up push parameters and Modify the verification method for thinking length by @luukunn in #4177
Fix noaux_tc cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method by @ckl117 in #4115
[Feature]CP support data clear by @ltd0924 in #4214
[fix] fix clearing caches synchronization and add more logs by @liyonghua0910 in #4212
fix ernie vl distributed attr. by @ZHUI in #4217
[2.2]include_stop_str_in_output=False not return eos text by @ckl117 in #4231
[fix]update apply_chat_template by @luukunn in #4249
[fix]remove reasoning_max_tokens=max_toksns*0.8 in sampling_params by @luukunn in #4294
【fix】Remove the logic that assigns the default value of 80% to reasoning_max_tokens in the offline component of FastDeploy by @kxz2002 in #4304
[feature]2.2 custom_allreduce support cudagraph recapture by @ckl117 in #4307
[BUGFIX] clear request by @ltd0924 in #4320

Full Changelog: v2.2.0...v2.2.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2.2.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

新增功能

What's Changed

Contributors

Uh oh!