Skip to content

Commit e502a28

Browse files
jinyouzhiCSWYF3634076vasquDarkLight1337
authored
Porting ERNIE 4.5 and ERNIE 4.5 MoE model support (#1779)
Enable ERNIE 4.5 series text generation models including Dense and MoE. This PR contains: - cherry-pick vllm-project#20220 - bugfix vllm-project#21735 (name mistake) then validate on - [ERNIE-4.5-0.3B-PT](https://huggingface.co/baidu/ERNIE-4.5-0.3B-PT) - [ERNIE-4.5-21B-A3B-PT](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-PT) @Wei-Lin-Intel --------- Signed-off-by: wangyafeng <[email protected]> Signed-off-by: vasqu <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: CSWYF3634076 <[email protected]> Co-authored-by: Anton Vlasjuk <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>
1 parent 8b03c85 commit e502a28

File tree

5 files changed

+638
-5
lines changed

5 files changed

+638
-5
lines changed

docs/models/supported_models.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -314,6 +314,8 @@ Specified using `--task generate`.
314314
| `DeepseekForCausalLM` | DeepSeek | `deepseek-ai/deepseek-llm-67b-base`, `deepseek-ai/deepseek-llm-7b-chat` etc. | | ✅︎ |
315315
| `DeepseekV2ForCausalLM` | DeepSeek-V2 | `deepseek-ai/DeepSeek-V2`, `deepseek-ai/DeepSeek-V2-Chat` etc. | | ✅︎ |
316316
| `DeepseekV3ForCausalLM` | DeepSeek-V3 | `deepseek-ai/DeepSeek-V3-Base`, `deepseek-ai/DeepSeek-V3` etc. | | ✅︎ |
317+
| `Ernie4_5ForCausalLM` | Ernie4.5 | `baidu/ERNIE-4.5-0.3B-PT`,etc. | | ✅︎ | ✅︎ |
318+
| `Ernie4_5_MoeForCausalLM` | Ernie4.5MoE | `baidu/ERNIE-4.5-21B-A3B-PT`, `baidu/ERNIE-4.5-300B-A47B-PT`, etc. | | ✅︎ | ✅︎ |
317319
| `ExaoneForCausalLM` | EXAONE-3 | `LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct`, etc. | ✅︎ | ✅︎ |
318320
| `FalconForCausalLM` | Falcon | `tiiuae/falcon-7b`, `tiiuae/falcon-40b`, `tiiuae/falcon-rw-7b`, etc. | | ✅︎ |
319321
| `FalconMambaForCausalLM` | FalconMamba | `tiiuae/falcon-mamba-7b`, `tiiuae/falcon-mamba-7b-instruct`, etc. | | ✅︎ |
@@ -371,7 +373,6 @@ Specified using `--task generate`.
371373
| `XverseForCausalLM` | XVERSE | `xverse/XVERSE-7B-Chat`, `xverse/XVERSE-13B-Chat`, `xverse/XVERSE-65B-Chat`, etc. | ✅︎ | ✅︎ |
372374
| `MiniMaxText01ForCausalLM` | MiniMax-Text | `MiniMaxAI/MiniMax-Text-01`, etc. | | |
373375
| `Zamba2ForCausalLM` | Zamba2 | `Zyphra/Zamba2-7B-instruct`, `Zyphra/Zamba2-2.7B-instruct`, `Zyphra/Zamba2-1.2B-instruct`, etc. | | |
374-
375376
!!! note
376377
Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.
377378

@@ -556,10 +557,10 @@ Specified using `--task generate`.
556557
| `SmolVLMForConditionalGeneration` | SmolVLM2 | T + I | `SmolVLM2-2.2B-Instruct` | ✅︎ | | ✅︎ |
557558
| `TarsierForConditionalGeneration` | Tarsier | T + I<sup>E+</sup> | `omni-search/Tarsier-7b`,`omni-search/Tarsier-34b` | | ✅︎ | ✅︎ |
558559

559-
<sup>^</sup> You need to set the architecture name via `--hf-overrides` to match the one in vLLM.
560-
&nbsp;&nbsp;&nbsp;&nbsp;• For example, to use DeepSeek-VL2 series models:
561-
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`--hf-overrides '{"architectures": ["DeepseekVLV2ForCausalLM"]}'`
562-
<sup>E</sup> Pre-computed embeddings can be inputted for this modality.
560+
<sup>^</sup> You need to set the architecture name via `--hf-overrides` to match the one in vLLM.
561+
&nbsp;&nbsp;&nbsp;&nbsp;• For example, to use DeepSeek-VL2 series models:
562+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`--hf-overrides '{"architectures": ["DeepseekVLV2ForCausalLM"]}'`
563+
<sup>E</sup> Pre-computed embeddings can be inputted for this modality.
563564
<sup>+</sup> Multiple items can be inputted per text prompt for this modality.
564565

565566
!!! warning

tests/models/registry.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,10 @@ def check_available_online(
156156
trust_remote_code=True),
157157
"DeepseekV3ForCausalLM": _HfExamplesInfo("deepseek-ai/DeepSeek-V3", # noqa: E501
158158
trust_remote_code=True),
159+
"Ernie4_5ForCausalLM": _HfExamplesInfo("baidu/ERNIE-4.5-0.3B-PT",
160+
min_transformers_version="4.54"),
161+
"Ernie4_5_MoeForCausalLM": _HfExamplesInfo("baidu/ERNIE-4.5-21B-A3B-PT",
162+
trust_remote_code=True),
159163
"ExaoneForCausalLM": _HfExamplesInfo("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct"), # noqa: E501
160164
"Fairseq2LlamaForCausalLM": _HfExamplesInfo("mgleize/fairseq2-dummy-Llama-3.2-1B"), # noqa: E501
161165
"FalconForCausalLM": _HfExamplesInfo("tiiuae/falcon-7b"),

vllm/model_executor/models/ernie45.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# SPDX-License-Identifier: Apache-2.0
2+
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
3+
4+
# Copyright 2025 The Baidu team.
5+
# Copyright 2023 The vLLM team.
6+
# Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved.
7+
#
8+
# This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
9+
# and OPT implementations in this library. It has been modified from its
10+
# original forms to accommodate minor architectural differences compared
11+
# to GPT-NeoX and OPT used by the Meta AI team that trained the model.
12+
#
13+
# Licensed under the Apache License, Version 2.0 (the "License");
14+
# you may not use this file except in compliance with the License.
15+
# You may obtain a copy of the License at
16+
#
17+
# http://www.apache.org/licenses/LICENSE-2.0
18+
#
19+
# Unless required by applicable law or agreed to in writing, software
20+
# distributed under the License is distributed on an "AS IS" BASIS,
21+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
22+
# See the License for the specific language governing permissions and
23+
# limitations under the License.
24+
"""Inference-only Erine model compatible with HuggingFace weights."""
25+
from vllm.config import VllmConfig
26+
from vllm.model_executor.models.llama import LlamaForCausalLM
27+
28+
from .utils import PPMissingLayer
29+
30+
31+
class Ernie4_5ForCausalLM(LlamaForCausalLM):
32+
33+
def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""):
34+
super().__init__(vllm_config=vllm_config, prefix=prefix)
35+
# Hack Llama model to fit HF format Ernie4.5 dense implementation
36+
# Attention difference between Ernie and Llama:
37+
# 1. rotary_dim and no Neox style.
38+
# 2. There is no bias for o_proj in attention
39+
for layer in self.model.layers:
40+
if not isinstance(layer, PPMissingLayer):
41+
layer.self_attn.rotary_emb.is_neox_style = False
42+
layer.self_attn.o_proj.bias = None
43+
layer.self_attn.o_proj.skip_bias_add = True

0 commit comments

Comments
 (0)