Skip to content

Commit 01a0957

Browse files
authored
Merge branch 'develop' into append_attn_pr
2 parents b977c0e + f516421 commit 01a0957

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+1033
-543
lines changed

.github/workflows/_base_test.yml

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -121,9 +121,8 @@ jobs:
121121
# python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/
122122
python -m pip install paddlepaddle-gpu==3.0.0.dev20250729 -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/
123123
124-
pip config set global.index-url http://pip.baidu.com/root/baidu/+simple/
125-
pip config set install.trusted-host pip.baidu.com
126-
pip config set global.extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
124+
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
125+
127126
python -m pip install ${fastdeploy_wheel_url}
128127
python -m pip install pytest
129128
@@ -150,7 +149,12 @@ jobs:
150149
export URL=http://localhost:${FD_API_PORT}/v1/chat/completions
151150
export TEMPLATE=TOKEN_LOGPROB
152151
TEST_EXIT_CODE=0
153-
python -m pytest -sv . || TEST_EXIT_CODE=$?
152+
python -m pytest -sv test_base_chat.py test_compare_top_logprobs.py test_logprobs.py test_params_boundary.py test_seed_usage.py test_stream.py test_evil_cases.py || TEST_EXIT_CODE=1
153+
curl -X POST http://0.0.0.0:${FLASK_PORT}/switch \
154+
-H "Content-Type: application/json" \
155+
-d "{\"--model\": \"/MODELDATA/ERNIE-4.5-0.3B-Paddle\", \"--early-stop-config\": \"{\\\"enable_early_stop\\\":true, \\\"window_size\\\":6, \\\"threshold\\\":0.93}\"}"
156+
curl -X POST http://localhost:${FLASK_PORT}/wait_for_infer?timeout=90
157+
python -m pytest -sv test_repetition_early_stop.py || TEST_EXIT_CODE=1
154158
popd
155159
echo "TEST_EXIT_CODE=${TEST_EXIT_CODE}" >> /workspace/FastDeploy/exit_code.env
156160
'

.github/workflows/_build_linux.yml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -125,9 +125,7 @@ jobs:
125125
export FASTDEPLOY_VERSION="${FASTDEPLOY_VERSION}.dev${DATE_ONLY}"
126126
fi
127127
python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/
128-
pip config set global.index-url http://pip.baidu.com/root/baidu/+simple/
129-
pip config set install.trusted-host pip.baidu.com
130-
pip config set global.extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
128+
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
131129
132130
python -m pip install --upgrade pip
133131
python -m pip install -r requirements.txt

.github/workflows/_logprob_test_linux.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -114,9 +114,8 @@ jobs:
114114
# python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/
115115
python -m pip install paddlepaddle-gpu==3.0.0.dev20250729 -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/
116116
117-
pip config set global.index-url http://pip.baidu.com/root/baidu/+simple/
118-
pip config set install.trusted-host pip.baidu.com
119-
pip config set global.extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
117+
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
118+
120119
python -m pip install ${fastdeploy_wheel_url}
121120
122121
wget https://paddle-qa.bj.bcebos.com/zhengtianyu/tools/llm-deploy-linux-amd64

.github/workflows/_unit_test_coverage.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -96,9 +96,8 @@ jobs:
9696
# python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/
9797
python -m pip install paddlepaddle-gpu==3.0.0.dev20250729 -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/
9898
99-
pip config set global.index-url http://pip.baidu.com/root/baidu/+simple/
100-
pip config set install.trusted-host pip.baidu.com
101-
pip config set global.extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
99+
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
100+
102101
103102
python -m pip install coverage
104103
python -m pip install diff-cover

.github/workflows/gh-pages.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ jobs:
1515
- uses: actions/setup-python@v5
1616
with:
1717
python-version: 3.x
18-
- run: pip install mkdocs-material mkdocs-get-deps mkdocs-material-extensions mkdocs-multilang
18+
- run: pip install mkdocs-material mkdocs-get-deps mkdocs-material-extensions mkdocs-multilang mkdocs-static-i18n
1919
- name: Deploy to GitHub Pages
2020
env:
2121
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
English | [简体中文](README_CN.md)
12
<p align="center">
23
<a href="https://github.com/PaddlePaddle/FastDeploy/releases"><img src="https://github.com/user-attachments/assets/42b0039f-39e3-4279-afda-6d1865dfbffb" width="500"></a>
34
</p>
@@ -68,7 +69,7 @@ Learn how to use FastDeploy through our documentation:
6869
- [Offline Inference Development](./docs/offline_inference.md)
6970
- [Online Service Deployment](./docs/online_serving/README.md)
7071
- [Full Supported Models List](./docs/supported_models.md)
71-
- [Optimal Deployment](./docs/optimal_deployment/README.md)
72+
- [Best Practices](./docs/best_practices/README.md)
7273

7374
## Supported Models
7475

README_CN.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
[English](README.md) | 简体中文
2+
<p align="center">
3+
<a href="https://github.com/PaddlePaddle/FastDeploy/releases"><img src="https://github.com/user-attachments/assets/42b0039f-39e3-4279-afda-6d1865dfbffb" width="500"></a>
4+
</p>
5+
<p align="center">
6+
<a href=""><img src="https://img.shields.io/badge/python-3.10-aff.svg"></a>
7+
<a href=""><img src="https://img.shields.io/badge/os-linux-pink.svg"></a>
8+
<a href="https://github.com/PaddlePaddle/FastDeploy/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/FastDeploy?color=9ea"></a>
9+
<a href="https://github.com/PaddlePaddle/FastDeploy/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/FastDeploy?color=3af"></a>
10+
<a href="https://github.com/PaddlePaddle/FastDeploy/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/FastDeploy?color=9cc"></a>
11+
<a href="https://github.com/PaddlePaddle/FastDeploy/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/FastDeploy?color=ccf"></a>
12+
13+
</p>
14+
15+
<p align="center">
16+
<a href="https://trendshift.io/repositories/4046" target="_blank"><img src="https://trendshift.io/api/badge/repositories/4046" alt="PaddlePaddle%2FFastDeploy | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a></br>
17+
<a href="https://paddlepaddle.github.io/FastDeploy/zh/get_started/installation/nvidia_gpu/"><b> 安装指导 </b></a>
18+
|
19+
<a href="https://paddlepaddle.github.io/FastDeploy/zh/get_started/quick_start"><b> 快速入门 </b></a>
20+
|
21+
<a href="https://paddlepaddle.github.io/FastDeploy/zh/supported_models/"><b> 支持模型列表 </b></a>
22+
23+
</p>
24+
25+
--------------------------------------------------------------------------------
26+
# FastDeploy 2.0:基于飞桨的大语言模型与视觉语言模型推理部署工具包
27+
28+
## 最新活动
29+
30+
**[2025-07] 《FastDeploy2.0推理部署实测》专题活动已上线!** 完成文心4.5系列开源模型的推理部署等任务,即可获得骨瓷马克杯等FastDeploy2.0官方周边及丰富奖金!🎁 欢迎大家体验反馈~ 📌[报名地址](https://www.wjx.top/vm/meSsp3L.aspx#) 📌[活动详情](https://github.com/PaddlePaddle/FastDeploy/discussions/2728)
31+
32+
## 关于
33+
34+
**FastDeploy** 是基于飞桨(PaddlePaddle)的大语言模型(LLM)与视觉语言模型(VLM)推理部署工具包,提供**开箱即用的生产级部署方案**,核心技术特性包括:
35+
36+
- 🚀 **负载均衡式PD分解**:工业级解决方案,支持上下文缓存与动态实例角色切换,在保障SLO达标和吞吐量的同时优化资源利用率
37+
- 🔄 **统一KV缓存传输**:轻量级高性能传输库,支持智能NVLink/RDMA选择
38+
- 🤝 **OpenAI API服务与vLLM兼容**:单命令部署,兼容[vLLM](https://github.com/vllm-project/vllm/)接口
39+
- 🧮 **全量化格式支持**:W8A16、W8A8、W4A16、W4A8、W2A16、FP8等
40+
-**高级加速技术**:推测解码、多令牌预测(MTP)及分块预填充
41+
- 🖥️ **多硬件支持**:NVIDIA GPU、昆仑芯XPU、海光DCU、昇腾NPU、天数智芯GPU、燧原GCU、沐曦GPU等
42+
43+
44+
## 要求
45+
46+
- 操作系统: Linux
47+
- Python: 3.10 ~ 3.12
48+
49+
## 安装
50+
51+
FastDeploy 支持在**英伟达(NVIDIA)GPU****昆仑芯(Kunlunxin)XPU****天数(Iluvatar)GPU****燧原(Enflame)GCU** 以及其他硬件上进行推理部署。详细安装说明如下:
52+
53+
- [英伟达 GPU](./docs/zh/get_started/installation/nvidia_gpu.md)
54+
- [昆仑芯 XPU](./docs/zh/get_started/installation/kunlunxin_xpu.md)
55+
- [天数 CoreX](./docs/zh/get_started/installation/iluvatar_gpu.md)
56+
- [燧原 S60](./docs/zh/get_started/installation/Enflame_gcu.md)
57+
58+
**注意:** 我们正在积极拓展硬件支持范围。目前,包括昇腾(Ascend)NPU、海光(Hygon)DCU 和摩尔线程(MetaX)GPU 在内的其他硬件平台正在开发测试中。敬请关注更新!
59+
60+
## 入门指南
61+
62+
通过我们的文档了解如何使用 FastDeploy:
63+
- [10分钟快速部署](./docs/zh/get_started/quick_start.md)
64+
- [ERNIE-4.5 部署](./docs/zh/get_started/ernie-4.5.md)
65+
- [ERNIE-4.5-VL 部署](./docs/zh/get_started/ernie-4.5-vl.md)
66+
- [离线推理](./docs/zh/offline_inference.md)
67+
- [在线服务](./docs/zh/online_serving/README.md)
68+
- [模型支持列表](./docs/zh/supported_models.md)
69+
- [最佳实践](./docs/zh/best_practices/README.md)
70+
71+
## 支持模型列表
72+
73+
| Model | Data Type | PD Disaggregation | Chunked Prefill | Prefix Caching | MTP | CUDA Graph | Maximum Context Length |
74+
|:--- | :------- | :---------- | :-------- | :-------- | :----- | :----- | :----- |
75+
|ERNIE-4.5-300B-A47B | BF16/WINT4/WINT8/W4A8C8/WINT2/FP8 ||||✅(WINT4)| WIP |128K |
76+
|ERNIE-4.5-300B-A47B-Base| BF16/WINT4/WINT8 ||||✅(WINT4)| WIP | 128K |
77+
|ERNIE-4.5-VL-424B-A47B | BF16/WINT4/WINT8 | WIP || WIP || WIP |128K |
78+
|ERNIE-4.5-VL-28B-A3B | BF16/WINT4/WINT8 ||| WIP || WIP |128K |
79+
|ERNIE-4.5-21B-A3B | BF16/WINT4/WINT8/FP8 |||| WIP ||128K |
80+
|ERNIE-4.5-21B-A3B-Base | BF16/WINT4/WINT8/FP8 |||| WIP ||128K |
81+
|ERNIE-4.5-0.3B | BF16/WINT8/FP8 |||||| 128K |
82+
83+
## 进阶用法
84+
85+
- [量化](./docs/zh/quantization/README.md)
86+
- [分离式部署](./docs/zh/features/disaggregated.md)
87+
- [投机解码](./docs/zh/features/speculative_decoding.md)
88+
- [前缀缓存](./docs/zh/features/prefix_caching.md)
89+
- [分块预填充](./docs/zh/features/chunked_prefill.md)
90+
91+
## 致谢
92+
93+
FastDeploy 依据 [Apache-2.0 开源许可证](./LICENSE). 进行授权。在开发过程中,我们参考并借鉴了 [vLLM](https://github.com/vllm-project/vllm) 的部分代码,以保持接口兼容性,在此表示衷心感谢。

docs/optimal_deployment/ERNIE-4.5-0.3B-Paddle.md renamed to docs/best_practices/ERNIE-4.5-0.3B-Paddle.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22
## Environmental Preparation
33
### 1.1 Hardware requirements
44
The minimum number of GPUs required to deploy `ERNIE-4.5-0.3B` on the following hardware for each quantization is as follows:
5-
| | WINT8 | WINT4 | FP8 |
5+
6+
| | WINT8 | WINT4 | FP8 |
67
|-----|-----|-----|-----|
78
|H800 80GB| 1 | 1 | 1 |
89
|A800 80GB| 1 | 1 | / |

docs/optimal_deployment/ERNIE-4.5-21B-A3B-Paddle.md renamed to docs/best_practices/ERNIE-4.5-21B-A3B-Paddle.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22
## Environmental Preparation
33
### 1.1 Hardware requirements
44
The minimum number of GPUs required to deploy `ERNIE-4.5-21B-A3B` on the following hardware for each quantization is as follows:
5-
| | WINT8 | WINT4 | FP8 |
5+
6+
| | WINT8 | WINT4 | FP8 |
67
|-----|-----|-----|-----|
78
|H800 80GB| 1 | 1 | 1 |
89
|A800 80GB| 1 | 1 | / |

docs/optimal_deployment/ERNIE-4.5-300B-A47B-Paddle.md renamed to docs/best_practices/ERNIE-4.5-300B-A47B-Paddle.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22
## Environmental Preparation
33
### 1.1 Hardware requirements
44
The minimum number of GPUs required to deploy `ERNIE-4.5-300B-A47B` on the following hardware for each quantization is as follows:
5-
| | WINT8 | WINT4 | FP8 | WINT2 | W4A8 |
5+
6+
| | WINT8 | WINT4 | FP8 | WINT2 | W4A8 |
67
|-----|-----|-----|-----|-----|-----|
78
|H800 80GB| 8 | 4 | 8 | 2 | 4 |
89
|A800 80GB| 8 | 4 | / | 2 | 4 |

0 commit comments

Comments
 (0)