Skip to content

Commit c5d00f9

Browse files
authored
【Inference Optimize】Use jit.inference to convert qwen2-5-vl and deepseek-vl2 language models into static graphs (#1200)
Use jit.inference to convert qwen2-5-vl and deepseek-vl2 language models into static graphs
1 parent 469f27e commit c5d00f9

File tree

9 files changed

+234
-394
lines changed

9 files changed

+234
-394
lines changed

deploy/deepseek_vl2/README.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
| deepseek-ai/deepseek-vl2-small |
99

1010
## 环境安装
11-
[安装PaddlePaddle](https://github.com/PaddlePaddle/PaddleMIX?tab=readme-ov-file#3-%EF%B8%8F%E5%AE%89%E8%A3%85paddlepaddle)
11+
1) [安装PaddlePaddle](https://github.com/PaddlePaddle/PaddleMIX?tab=readme-ov-file#3-%EF%B8%8F%E5%AE%89%E8%A3%85paddlepaddle)
1212
- **python >= 3.10**
1313
- **paddlepaddle-gpu 要求develop版本**
1414
```bash
@@ -29,14 +29,16 @@ git clone --depth=1 https://github.com/PaddlePaddle/PaddleNLP.git
2929
cd PaddleNLP
3030
pip install -e .
3131

32-
# 安装paddlenlp_ops pre-build
32+
# 安装paddlenlp_ops pre-build
3333
pip install https://paddlenlp.bj.bcebos.com/ops/cu118/paddlenlp_ops-3.0.0b4-py3-none-any.whl
3434
```
3535

3636
## 3 高性能推理
3737

3838
### a. fp16 高性能推理
3939

40+
cd PaddleMIX
41+
4042
```
4143
export CUDA_VISIBLE_DEVICES=0
4244
export FLAGS_mla_use_tensorcore=0
@@ -121,3 +123,15 @@ sh deploy/deepseek_vl2/shell/run.sh
121123
| ------------------ | -------------- |
122124
| min_length | 128 |
123125
| min_length | 128 |
126+
127+
## 在 NVIDIA A800-SXM4-80GB 上测试的性能如下:
128+
129+
#### 下方表格中所示性能对应的输入输出大小。
130+
| parameter | Value |
131+
| ------------------ | -------------- |
132+
| input_tokens_len | 1428 tokens |
133+
| output_tokens_len | 128 tokens |
134+
135+
| model | Paddle Inference wint8 | Paddle Inference| PyTorch | VLLM |
136+
| ----------------------------- | --------------------- | --------------- | -------------- |-------------- |
137+
| deepseek-ai/deepseek-vl2-small | 1.52 s | 1.77 s | 4.92 s | 1.39s |

0 commit comments

Comments
 (0)