Skip to content

Commit 44b9af5

Browse files
[Benchmark] Enable MM Embedding benchmarks (#26310)
Signed-off-by: DarkLight1337 <[email protected]>
1 parent 7cd95dc commit 44b9af5

File tree

5 files changed

+358
-107
lines changed

5 files changed

+358
-107
lines changed

docs/contributing/benchmarks.md

Lines changed: 104 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -67,13 +67,13 @@ Legend:
6767
<details class="admonition abstract" markdown="1">
6868
<summary>Show more</summary>
6969

70-
First start serving your model
70+
First start serving your model:
7171

7272
```bash
7373
vllm serve NousResearch/Hermes-3-Llama-3.1-8B
7474
```
7575

76-
Then run the benchmarking script
76+
Then run the benchmarking script:
7777

7878
```bash
7979
# download dataset
@@ -87,7 +87,7 @@ vllm bench serve \
8787
--num-prompts 10
8888
```
8989

90-
If successful, you will see the following output
90+
If successful, you will see the following output:
9191

9292
```text
9393
============ Serving Benchmark Result ============
@@ -125,7 +125,7 @@ If the dataset you want to benchmark is not supported yet in vLLM, even then you
125125

126126
```bash
127127
# start server
128-
VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.1-8B-Instruct
128+
vllm serve meta-llama/Llama-3.1-8B-Instruct
129129
```
130130

131131
```bash
@@ -167,7 +167,7 @@ vllm bench serve \
167167
##### InstructCoder Benchmark with Speculative Decoding
168168

169169
``` bash
170-
VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
170+
vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
171171
--speculative-config $'{"method": "ngram",
172172
"num_speculative_tokens": 5, "prompt_lookup_max": 5,
173173
"prompt_lookup_min": 2}'
@@ -184,7 +184,7 @@ vllm bench serve \
184184
##### Spec Bench Benchmark with Speculative Decoding
185185

186186
``` bash
187-
VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
187+
vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
188188
--speculative-config $'{"method": "ngram",
189189
"num_speculative_tokens": 5, "prompt_lookup_max": 5,
190190
"prompt_lookup_min": 2}'
@@ -366,7 +366,6 @@ Total num output tokens: 1280
366366

367367
``` bash
368368
VLLM_WORKER_MULTIPROC_METHOD=spawn \
369-
VLLM_USE_V1=1 \
370369
vllm bench throughput \
371370
--dataset-name=hf \
372371
--dataset-path=likaixin/InstructCoder \
@@ -781,6 +780,104 @@ This should be seen as an edge case, and if this behavior can be avoided by sett
781780

782781
</details>
783782

783+
#### Embedding Benchmark
784+
785+
Benchmark the performance of embedding requests in vLLM.
786+
787+
<details class="admonition abstract" markdown="1">
788+
<summary>Show more</summary>
789+
790+
##### Text Embeddings
791+
792+
Unlike generative models which use Completions API or Chat Completions API,
793+
you should set `--backend openai-embeddings` and `--endpoint /v1/embeddings` to use the Embeddings API.
794+
795+
You can use any text dataset to benchmark the model, such as ShareGPT.
796+
797+
Start the server:
798+
799+
```bash
800+
vllm serve jinaai/jina-embeddings-v3 --trust-remote-code
801+
```
802+
803+
Run the benchmark:
804+
805+
```bash
806+
# download dataset
807+
# wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
808+
vllm bench serve \
809+
--model jinaai/jina-embeddings-v3 \
810+
--backend openai-embeddings \
811+
--endpoint /v1/embeddings \
812+
--dataset-name sharegpt \
813+
--dataset-path <your data path>/ShareGPT_V3_unfiltered_cleaned_split.json
814+
```
815+
816+
##### Multi-modal Embeddings
817+
818+
Unlike generative models which use Completions API or Chat Completions API,
819+
you should set `--endpoint /v1/embeddings` to use the Embeddings API. The backend to use depends on the model:
820+
821+
- CLIP: `--backend openai-embeddings-clip`
822+
- VLM2Vec: `--backend openai-embeddings-vlm2vec`
823+
824+
For other models, please add your own implementation inside <gh-file:vllm/benchmarks/lib/endpoint_request_func.py> to match the expected instruction format.
825+
826+
You can use any text or multi-modal dataset to benchmark the model, as long as the model supports it.
827+
For example, you can use ShareGPT and VisionArena to benchmark vision-language embeddings.
828+
829+
Serve and benchmark CLIP:
830+
831+
```bash
832+
# Run this in another process
833+
vllm serve openai/clip-vit-base-patch32
834+
835+
# Run these one by one after the server is up
836+
# download dataset
837+
# wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
838+
vllm bench serve \
839+
--model openai/clip-vit-base-patch32 \
840+
--backend openai-embeddings-clip \
841+
--endpoint /v1/embeddings \
842+
--dataset-name sharegpt \
843+
--dataset-path <your data path>/ShareGPT_V3_unfiltered_cleaned_split.json
844+
845+
vllm bench serve \
846+
--model openai/clip-vit-base-patch32 \
847+
--backend openai-embeddings-clip \
848+
--endpoint /v1/embeddings \
849+
--dataset-name hf \
850+
--dataset-path lmarena-ai/VisionArena-Chat
851+
```
852+
853+
Serve and benchmark VLM2Vec:
854+
855+
```bash
856+
# Run this in another process
857+
vllm serve TIGER-Lab/VLM2Vec-Full --runner pooling \
858+
--trust-remote-code \
859+
--chat-template examples/template_vlm2vec_phi3v.jinja
860+
861+
# Run these one by one after the server is up
862+
# download dataset
863+
# wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
864+
vllm bench serve \
865+
--model TIGER-Lab/VLM2Vec-Full \
866+
--backend openai-embeddings-vlm2vec \
867+
--endpoint /v1/embeddings \
868+
--dataset-name sharegpt \
869+
--dataset-path <your data path>/ShareGPT_V3_unfiltered_cleaned_split.json
870+
871+
vllm bench serve \
872+
--model TIGER-Lab/VLM2Vec-Full \
873+
--backend openai-embeddings-vlm2vec \
874+
--endpoint /v1/embeddings \
875+
--dataset-name hf \
876+
--dataset-path lmarena-ai/VisionArena-Chat
877+
```
878+
879+
</details>
880+
784881
[](){ #performance-benchmarks }
785882

786883
## Performance Benchmarks

vllm/benchmarks/datasets.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1582,10 +1582,10 @@ def get_samples(args, tokenizer) -> list[SampleRequest]:
15821582
"like to add support for additional dataset formats."
15831583
)
15841584

1585-
if dataset_class.IS_MULTIMODAL and args.backend not in [
1586-
"openai-chat",
1587-
"openai-audio",
1588-
]:
1585+
if dataset_class.IS_MULTIMODAL and not (
1586+
args.backend in ("openai-chat", "openai-audio")
1587+
or "openai-embeddings-" in args.backend
1588+
):
15891589
# multi-modal benchmark is only available on OpenAI Chat
15901590
# endpoint-type.
15911591
raise ValueError(

0 commit comments

Comments
 (0)