[Benchmark] Enable MM Embedding benchmarks #26310

DarkLight1337 · 2025-10-06T17:27:36Z

Purpose

Follow-up to #26010

Consolidate common code in endpoint_request_func.py
Implement openai-embeddings-chat backend
- Based on this, add openai-embeddings-clip and openai-embeddings-vlm2vec backends.
- This is needed because each model has its own instruction format. An alternative would be to directly edit the dataset, which is too troublesome IMO.
Fix sampling_params being passed to openai-embeddings backend
Fix Loading chat template fallback log spam
Add text and MM embedding benchmark examples to benchmarks.md
Clean up benchmarks.md
- Add missing colons
- Remove deprecated VLLM_USE_V1

cc @maxdebayser @noooop @ZJY0516

Test Plan

Run each newly added example

Test Result

CLIP example:

# ShareGPT
============ Serving Benchmark Result ============
Successful requests:                     1000      
Benchmark duration (s):                  6.06      
Total input tokens:                      51039     
Request throughput (req/s):              165.02    
Total Token throughput (tok/s):          8422.63   
==================================================

# VisionArena
# NOTE: Each image corresponds to 1 token so this is reasonable
============ Serving Benchmark Result ============
Successful requests:                     1000      
Benchmark duration (s):                  25.62     
Total input tokens:                      1000      
Request throughput (req/s):              39.03     
Total Token throughput (tok/s):          39.03     
==================================================

VLM2Vec example:

# ShareGPT
============ Serving Benchmark Result ============
Successful requests:                     1000      
Benchmark duration (s):                  30.51     
Total input tokens:                      244995    
Request throughput (req/s):              32.78     
Total Token throughput (tok/s):          8031.02   
==================================================

# VisionArena
============ Serving Benchmark Result ============
Successful requests:                     999       
Benchmark duration (s):                  135.56    
Total input tokens:                      759433    
Request throughput (req/s):              7.37      
Total Token throughput (tok/s):          5602.02   
==================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <[email protected]>

gemini-code-assist

Code Review

This pull request introduces support for multi-modal embedding benchmarks, refactors the benchmark request functions for better code reuse, and fixes a few bugs related to sampling parameters and log spam. The changes are well-structured and improve the codebase. However, I found several copy-paste errors in the new documentation for embedding benchmarks, where example commands are syntactically incorrect due to missing or extra line continuation characters. These should be fixed to ensure users can run the examples without errors.

docs/contributing/benchmarks.md

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-10-06T17:30:51Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces benchmarking support for multimodal embeddings, which is a great addition. The code is well-refactored, with common logic extracted into helper functions, improving readability and maintainability. I've identified a couple of high-severity issues related to consistency and determinism in the benchmark logic that should be addressed to ensure the benchmarks are reliable and correct. The documentation updates are clear and helpful.

vllm/benchmarks/lib/endpoint_request_func.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-06T17:32:47Z

vllm/benchmarks/serve.py

+    backend = args.backend
+    task_type = TaskType.EMBEDDING if "embeddings" in backend else TaskType.GENERATION
+


Detect embeddings from endpoint, not just backend string

The new logic determines task_type solely via args.backend containing the substring "embeddings". The request function selection in benchmark() also now follows endpoint_type without inspecting the URL. A command that previously worked—e.g. vllm bench serve --backend vllm --endpoint /v1/embeddings—will now classify the run as TaskType.GENERATION, call the completions request handler, and immediately raise because the handler validates that the URL ends with /completions. Before this change embeddings were detected from api_url.endswith("/v1/embeddings") so the same command successfully used the embeddings request path. Consider deriving task_type (and the request handler) from args.endpoint as well to keep embeddings benchmarks working for non openai-embeddings* backends.

Useful? React with 👍 / 👎.

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-10-06T17:46:06Z

/gemini review

Signed-off-by: DarkLight1337 <[email protected]>

gemini-code-assist

Code Review

This pull request is a great addition, enabling benchmarking for multi-modal embedding models. The refactoring in endpoint_request_func.py to consolidate common code improves maintainability. The introduction of new backends for CLIP and VLM2Vec is well-structured, and the documentation updates in benchmarks.md are clear and helpful. The fixes for sampling_params in embedding backends and the log spam are also valuable improvements.

I've found one potential issue regarding model name resolution in the openai-embeddings backend which could cause benchmarks to fail in certain configurations. Please see my detailed comment.

vllm/benchmarks/lib/endpoint_request_func.py

[Benchmark] Add MM Embedding benchmarks

ba276d4

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested review from mgoin and ywang96 October 6, 2025 17:27

DarkLight1337 requested review from aarnphm and chaunceyjiang as code owners October 6, 2025 17:27

DarkLight1337 added the multi-modality Related to multi-modality (#4194) label Oct 6, 2025

DarkLight1337 added this to Multi-modality Core Oct 6, 2025

DarkLight1337 moved this to In Progress in Multi-modality Core Oct 6, 2025

mergify bot added documentation Improvements or additions to documentation frontend performance Performance-related issues labels Oct 6, 2025

gemini-code-assist bot reviewed Oct 6, 2025

View reviewed changes

docs/contributing/benchmarks.md Outdated Show resolved Hide resolved

Fix

36a745b

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 changed the title ~~[Benchmark] Add MM Embedding benchmarks~~ [Benchmark] Enable MM Embedding benchmarks Oct 6, 2025

gemini-code-assist bot reviewed Oct 6, 2025

View reviewed changes

vllm/benchmarks/lib/endpoint_request_func.py Outdated Show resolved Hide resolved

vllm/benchmarks/lib/endpoint_request_func.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Oct 6, 2025

View reviewed changes

Address comment

41a4776

Signed-off-by: DarkLight1337 <[email protected]>

Address comment

e1dad33

Signed-off-by: DarkLight1337 <[email protected]>

gemini-code-assist bot reviewed Oct 6, 2025

View reviewed changes

vllm/benchmarks/lib/endpoint_request_func.py Show resolved Hide resolved

ywang96 approved these changes Oct 6, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 6, 2025

DarkLight1337 enabled auto-merge (squash) October 6, 2025 18:20

DarkLight1337 merged commit 44b9af5 into vllm-project:main Oct 6, 2025
51 checks passed

DarkLight1337 deleted the mm-benchmarks branch October 6, 2025 19:51

github-project-automation bot moved this from In Progress to Done in Multi-modality Core Oct 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Benchmark] Enable MM Embedding benchmarks #26310

[Benchmark] Enable MM Embedding benchmarks #26310

DarkLight1337 commented Oct 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

DarkLight1337 commented Oct 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 6, 2025

Uh oh!

DarkLight1337 commented Oct 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		backend = args.backend
		task_type = TaskType.EMBEDDING if "embeddings" in backend else TaskType.GENERATION

Uh oh!

[Benchmark] Enable MM Embedding benchmarks #26310

[Benchmark] Enable MM Embedding benchmarks #26310

Conversation

DarkLight1337 commented Oct 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 commented Oct 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Oct 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Oct 6, 2025 •

edited by github-actions bot

Loading