Skip to content

Commit 759ef49

Browse files
authored
Remove V0 Encoder-Decoder Support (vllm-project#24907)
Signed-off-by: Woosuk Kwon <[email protected]>
1 parent 5206ab2 commit 759ef49

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+13
-9661
lines changed

.buildkite/scripts/hardware_ci/run-cpu-test.sh

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,6 @@ function cpu_tests() {
6666
6767
pytest -x -v -s tests/models/language/pooling -m cpu_model
6868
pytest -x -v -s tests/models/multimodal/generation \
69-
--ignore=tests/models/multimodal/generation/test_mllama.py \
7069
--ignore=tests/models/multimodal/generation/test_pixtral.py \
7170
-m cpu_model"
7271

.buildkite/test-pipeline.yaml

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -549,15 +549,6 @@ steps:
549549
commands: # LMEval+Transcription WER check
550550
- pytest -s entrypoints/openai/correctness/
551551

552-
- label: Encoder Decoder tests # 12min
553-
timeout_in_minutes: 20
554-
mirror_hardwares: [amdexperimental]
555-
source_file_dependencies:
556-
- vllm/
557-
- tests/encoder_decoder
558-
commands:
559-
- pytest -v -s encoder_decoder
560-
561552
- label: OpenAI-Compatible Tool Use # 23 min
562553
timeout_in_minutes: 35
563554
mirror_hardwares: [amdexperimental]

docs/contributing/model/multimodal.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -840,7 +840,6 @@ Some HF processors directly insert feature tokens without replacing anything in
840840
Examples:
841841

842842
- BLIP-2 (insert at start of prompt): <gh-file:vllm/model_executor/models/blip2.py>
843-
- Florence2 (insert at start of prompt): <gh-file:vllm/model_executor/models/florence2.py>
844843
- Molmo (insert after `<|endoftext|>` token): <gh-file:vllm/model_executor/models/molmo.py>
845844

846845
### Handling prompt updates unrelated to multi-modal data

docs/models/supported_models.md

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -331,8 +331,6 @@ th {
331331
| `BailingMoeV2ForCausalLM` | Ling | `inclusionAI/Ling-mini-2.0`, etc. | ✅︎ | ✅︎ | ✅︎ |
332332
| `BambaForCausalLM` | Bamba | `ibm-ai-platform/Bamba-9B-fp8`, `ibm-ai-platform/Bamba-9B` | ✅︎ | ✅︎ | ✅︎ |
333333
| `BloomForCausalLM` | BLOOM, BLOOMZ, BLOOMChat | `bigscience/bloom`, `bigscience/bloomz`, etc. | | ✅︎ | ✅︎ |
334-
| `BartForConditionalGeneration` | BART | `facebook/bart-base`, `facebook/bart-large-cnn`, etc. | | | |
335-
| `MBartForConditionalGeneration` | mBART | `facebook/mbart-large-en-ro`, `facebook/mbart-large-50`, etc. | | | |
336334
| `ChatGLMModel`, `ChatGLMForConditionalGeneration` | ChatGLM | `zai-org/chatglm2-6b`, `zai-org/chatglm3-6b`, `ShieldLM-6B-chatglm3`, etc. | ✅︎ | ✅︎ | ✅︎ |
337335
| `CohereForCausalLM`, `Cohere2ForCausalLM` | Command-R, Command-A | `CohereLabs/c4ai-command-r-v01`, `CohereLabs/c4ai-command-r7b-12-2024`, `CohereLabs/c4ai-command-a-03-2025`, `CohereLabs/command-a-reasoning-08-2025`, etc. | ✅︎ | ✅︎ | ✅︎ |
338336
| `DbrxForCausalLM` | DBRX | `databricks/dbrx-base`, `databricks/dbrx-instruct`, etc. | | ✅︎ | ✅︎ |
@@ -426,9 +424,6 @@ Some models are supported only via the [Transformers backend](#transformers). Th
426424
!!! note
427425
Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.
428426

429-
!!! note
430-
Some mBART models' config files do not have an `architecture` defined. Therefore, you need to use `--hf-overrides '{"architectures": ["MBartForConditionalGeneration"]}'` to explicitly specify the use of the `MBartForConditionalGeneration` architecture.
431-
432427
### Pooling Models
433428

434429
See [this page](./pooling_models.md) for more information on how to use pooling models.
@@ -625,9 +620,7 @@ These models primarily accept the [`LLM.generate`](./generative_models.md#llmgen
625620
| `ChameleonForConditionalGeneration` | Chameleon | T + I | `facebook/chameleon-7b`, etc. | | ✅︎ | ✅︎ |
626621
| `Cohere2VisionForConditionalGeneration` | Command A Vision | T + I<sup>+</sup> | `CohereLabs/command-a-vision-07-2025`, etc. | | ✅︎ | ✅︎ |
627622
| `DeepseekVLV2ForCausalLM`<sup>^</sup> | DeepSeek-VL2 | T + I<sup>+</sup> | `deepseek-ai/deepseek-vl2-tiny`, `deepseek-ai/deepseek-vl2-small`, `deepseek-ai/deepseek-vl2`, etc. | | ✅︎ | ✅︎ |
628-
| `DonutForConditionalGeneration`<sup>^</sup> | Donut | T + I | `ByteDance/Dolphin`, `naver-clova-ix/donut-base-finetuned-docvqa`, etc. | | | |
629623
| `Ernie4_5_VLMoeForConditionalGeneration` | Ernie4.5-VL | T + I<sup>+</sup>/ V<sup>+</sup> | `baidu/ERNIE-4.5-VL-28B-A3B-PT`, `baidu/ERNIE-4.5-VL-424B-A47B-PT` | | ✅︎ | ✅︎ |
630-
| `Florence2ForConditionalGeneration` | Florence-2 | T + I | `microsoft/Florence-2-base`, `microsoft/Florence-2-large`, etc. | | | |
631624
| `FuyuForCausalLM` | Fuyu | T + I | `adept/fuyu-8b`, etc. | | ✅︎ | ✅︎ |
632625
| `Gemma3ForConditionalGeneration` | Gemma 3 | T + I<sup>+</sup> | `google/gemma-3-4b-it`, `google/gemma-3-27b-it`, etc. | ✅︎ | ✅︎ | ⚠️ |
633626
| `Gemma3nForConditionalGeneration` | Gemma 3n | T + I + A | `google/gemma-3n-E2B-it`, `google/gemma-3n-E4B-it`, etc. | | | ✅︎ |
@@ -654,7 +647,6 @@ These models primarily accept the [`LLM.generate`](./generative_models.md#llmgen
654647
| `MiniCPMV` | MiniCPM-V | T + I<sup>E+</sup> + V<sup>E+</sup> | `openbmb/MiniCPM-V-2` (see note), `openbmb/MiniCPM-Llama3-V-2_5`, `openbmb/MiniCPM-V-2_6`, `openbmb/MiniCPM-V-4`, `openbmb/MiniCPM-V-4_5`, etc. | ✅︎ | | ✅︎ |
655648
| `MiniMaxVL01ForConditionalGeneration` | MiniMax-VL | T + I<sup>E+</sup> | `MiniMaxAI/MiniMax-VL-01`, etc. | | ✅︎ | ✅︎ |
656649
| `Mistral3ForConditionalGeneration` | Mistral3 (HF Transformers) | T + I<sup>+</sup> | `mistralai/Mistral-Small-3.1-24B-Instruct-2503`, etc. | ✅︎ | ✅︎ | ✅︎ |
657-
| `MllamaForConditionalGeneration` | Llama 3.2 | T + I<sup>+</sup> | `meta-llama/Llama-3.2-90B-Vision-Instruct`, `meta-llama/Llama-3.2-11B-Vision`, etc. | | | |
658650
| `MolmoForCausalLM` | Molmo | T + I<sup>+</sup> | `allenai/Molmo-7B-D-0924`, `allenai/Molmo-7B-O-0924`, etc. | ✅︎ | ✅︎ | ✅︎ |
659651
| `NVLM_D_Model` | NVLM-D 1.0 | T + I<sup>+</sup> | `nvidia/NVLM-D-72B`, etc. | | ✅︎ | ✅︎ |
660652
| `Ovis` | Ovis2, Ovis1.6 | T + I<sup>+</sup> | `AIDC-AI/Ovis2-1B`, `AIDC-AI/Ovis1.6-Llama3.2-3B`, etc. | | ✅︎ | ✅︎ |

docs/usage/v1_guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ Please note that prefix caching is not yet supported for any of the above models
120120

121121
Whisper is supported. Other models requiring cross-attention between separate
122122
encoder and decoder (e.g., `BartForConditionalGeneration`,
123-
`MllamaForConditionalGeneration`) are not yet supported.
123+
`MllamaForConditionalGeneration`) are not supported.
124124

125125
### Features
126126

examples/offline_inference/dolphin.py

Lines changed: 0 additions & 311 deletions
This file was deleted.

0 commit comments

Comments
 (0)