320
320
}
321
321
</style >
322
322
323
- | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
323
+ | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
324
324
| --------------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
325
325
| ` AquilaForCausalLM ` | Aquila, Aquila2 | ` BAAI/Aquila-7B ` , ` BAAI/AquilaChat-7B ` , etc. | ✅︎ | ✅︎ | ✅︎ |
326
326
| ` ArceeForCausalLM ` | Arcee (AFM) | ` arcee-ai/AFM-4.5B-Base ` , etc. | ✅︎ | ✅︎ | ✅︎ |
@@ -426,7 +426,7 @@ See [this page](./pooling_models.md) for more information on how to use pooling
426
426
427
427
These models primarily support the [ ` LLM.embed ` ] ( ./pooling_models.md#llmembed ) API.
428
428
429
- | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
429
+ | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
430
430
| --------------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
431
431
| ` BertModel ` <sup >C</sup > | BERT-based | ` BAAI/bge-base-en-v1.5 ` , ` Snowflake/snowflake-arctic-embed-xs ` , etc. | | | |
432
432
| ` Gemma2Model ` <sup >C</sup > | Gemma 2-based | ` BAAI/bge-multilingual-gemma2 ` , etc. | ✅︎ | | ✅︎ |
@@ -466,7 +466,7 @@ of the whole prompt are extracted from the normalized hidden state corresponding
466
466
467
467
These models primarily support the [ ` LLM.classify ` ] ( ./pooling_models.md#llmclassify ) API.
468
468
469
- | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
469
+ | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
470
470
| --------------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
471
471
| ` JambaForSequenceClassification ` | Jamba | ` ai21labs/Jamba-tiny-reward-dev ` , etc. | ✅︎ | ✅︎ | |
472
472
| ` GPT2ForSequenceClassification ` | GPT2 | ` nie3e/sentiment-polish-gpt2-small ` | | | ✅︎ |
@@ -483,7 +483,7 @@ If your model is not in the above list, we will try to automatically convert the
483
483
Cross-encoder and reranker models are a subset of classification models that accept two prompts as input.
484
484
These models primarily support the [ ` LLM.score ` ] ( ./pooling_models.md#llmscore ) API.
485
485
486
- | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
486
+ | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
487
487
| --------------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
488
488
| ` BertForSequenceClassification ` | BERT-based | ` cross-encoder/ms-marco-MiniLM-L-6-v2 ` , etc. | | | |
489
489
| ` GemmaForSequenceClassification ` | Gemma-based | ` BAAI/bge-reranker-v2-gemma ` (see note), etc. | ✅︎ | ✅︎ | ✅︎ |
@@ -521,7 +521,7 @@ These models primarily support the [`LLM.score`](./pooling_models.md#llmscore) A
521
521
522
522
These models primarily support the [ ` LLM.reward ` ] ( ./pooling_models.md#llmreward ) API.
523
523
524
- | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
524
+ | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
525
525
| --------------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
526
526
| ` InternLM2ForRewardModel ` | InternLM2-based | ` internlm/internlm2-1_8b-reward ` , ` internlm/internlm2-7b-reward ` , etc. | ✅︎ | ✅︎ | ✅︎ |
527
527
| ` LlamaForCausalLM ` <sup >C</sup > | Llama-based | ` peiyi9979/math-shepherd-mistral-7b-prm ` , etc. | ✅︎ | ✅︎ | ✅︎ |
@@ -594,7 +594,7 @@ See [this page](generative_models.md) for more information on how to use generat
594
594
595
595
These models primarily accept the [ ` LLM.generate ` ] ( ./generative_models.md#llmgenerate ) API. Chat/Instruct models additionally support the [ ` LLM.chat ` ] ( ./generative_models.md#llmchat ) API.
596
596
597
- | Architecture | Models | Inputs | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
597
+ | Architecture | Models | Inputs | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
598
598
| --------------| --------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
599
599
| ` AriaForConditionalGeneration ` | Aria | T + I<sup >+</sup > | ` rhymes-ai/Aria ` | | | ✅︎ |
600
600
| ` AyaVisionForConditionalGeneration ` | Aya Vision | T + I<sup >+</sup > | ` CohereForAI/aya-vision-8b ` , ` CohereForAI/aya-vision-32b ` , etc. | | ✅︎ | ✅︎ |
@@ -647,7 +647,7 @@ These models primarily accept the [`LLM.generate`](./generative_models.md#llmgen
647
647
648
648
Some models are supported only via the [ Transformers backend] ( #transformers ) . The purpose of the table below is to acknowledge models which we officially support in this way. The logs will say that the Transformers backend is being used, and you will see no warning that this is fallback behaviour. This means that, if you have issues with any of the models listed below, please [ make an issue] ( https://github.com/vllm-project/vllm/issues/new/choose ) and we'll do our best to fix it!
649
649
650
- | Architecture | Models | Inputs | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
650
+ | Architecture | Models | Inputs | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
651
651
| --------------| --------| --------| -------------------| -----------------------------| -----------------------------------------| ---------------------|
652
652
| ` Emu3ForConditionalGeneration ` | Emu3 | T + I | ` BAAI/Emu3-Chat-hf ` | ✅︎ | ✅︎ | ✅︎ |
653
653
@@ -726,7 +726,7 @@ Some models are supported only via the [Transformers backend](#transformers). Th
726
726
727
727
Speech2Text models trained specifically for Automatic Speech Recognition.
728
728
729
- | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
729
+ | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
730
730
| --------------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
731
731
| ` WhisperForConditionalGeneration ` | Whisper | ` openai/whisper-small ` , ` openai/whisper-large-v3-turbo ` , etc. | | | |
732
732
| ` VoxtralForConditionalGeneration ` | Voxtral (Mistral format) | ` mistralai/Voxtral-Mini-3B-2507 ` , ` mistralai/Voxtral-Small-24B-2507 ` , etc. | | ✅︎ | ✅︎ |
@@ -744,7 +744,7 @@ These models primarily support the [`LLM.embed`](./pooling_models.md#llmembed) A
744
744
745
745
The following table lists those that are tested in vLLM.
746
746
747
- | Architecture | Models | Inputs | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
747
+ | Architecture | Models | Inputs | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
748
748
| --------------| --------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
749
749
| ` LlavaNextForConditionalGeneration ` <sup >C</sup > | LLaVA-NeXT-based | T / I | ` royokong/e5-v ` | | | |
750
750
| ` Phi3VForCausalLM ` <sup >C</sup > | Phi-3-Vision-based | T + I | ` TIGER-Lab/VLM2Vec-Full ` | 🚧 | ✅︎ | |
@@ -760,7 +760,7 @@ The following table lists those that are tested in vLLM.
760
760
Cross-encoder and reranker models are a subset of classification models that accept two prompts as input.
761
761
These models primarily support the [ ` LLM.score ` ] ( ./pooling_models.md#llmscore ) API.
762
762
763
- | Architecture | Models | Inputs | Example HF Models | [ LoRA] [ lora-adapter ] | [ PP] [ distributed-serving ] | [ V1] ( gh-issue:8779 ) |
763
+ | Architecture | Models | Inputs | Example HF Models | [ LoRA] [ lora-adapter ] | [ PP] [ parallelism-scaling ] | [ V1] ( gh-issue:8779 ) |
764
764
| -------------------------------------| --------------------| ----------| --------------------------| ------------------------| -----------------------------| -----------------------|
765
765
| ` JinaVLForSequenceClassification ` | JinaVL-based | T + I<sup >E+</sup > | ` jinaai/jina-reranker-m0 ` , etc. | | | ✅︎ |
766
766
0 commit comments