-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Benchmark] Enable MM Embedding benchmarks #26310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: DarkLight1337 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for multi-modal embedding benchmarks, refactors the benchmark request functions for better code reuse, and fixes a few bugs related to sampling parameters and log spam. The changes are well-structured and improve the codebase. However, I found several copy-paste errors in the new documentation for embedding benchmarks, where example commands are syntactically incorrect due to missing or extra line continuation characters. These should be fixed to ensure users can run the examples without errors.
Signed-off-by: DarkLight1337 <[email protected]>
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces benchmarking support for multimodal embeddings, which is a great addition. The code is well-refactored, with common logic extracted into helper functions, improving readability and maintainability. I've identified a couple of high-severity issues related to consistency and determinism in the benchmark logic that should be addressed to ensure the benchmarks are reliable and correct. The documentation updates are clear and helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
backend = args.backend | ||
task_type = TaskType.EMBEDDING if "embeddings" in backend else TaskType.GENERATION | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detect embeddings from endpoint, not just backend string
The new logic determines task_type
solely via args.backend
containing the substring "embeddings"
. The request function selection in benchmark()
also now follows endpoint_type
without inspecting the URL. A command that previously worked—e.g. vllm bench serve --backend vllm --endpoint /v1/embeddings
—will now classify the run as TaskType.GENERATION
, call the completions request handler, and immediately raise because the handler validates that the URL ends with /completions
. Before this change embeddings were detected from api_url.endswith("/v1/embeddings")
so the same command successfully used the embeddings request path. Consider deriving task_type
(and the request handler) from args.endpoint
as well to keep embeddings benchmarks working for non openai-embeddings*
backends.
Useful? React with 👍 / 👎.
Signed-off-by: DarkLight1337 <[email protected]>
/gemini review |
Signed-off-by: DarkLight1337 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a great addition, enabling benchmarking for multi-modal embedding models. The refactoring in endpoint_request_func.py
to consolidate common code improves maintainability. The introduction of new backends for CLIP and VLM2Vec is well-structured, and the documentation updates in benchmarks.md
are clear and helpful. The fixes for sampling_params
in embedding backends and the log spam are also valuable improvements.
I've found one potential issue regarding model name resolution in the openai-embeddings
backend which could cause benchmarks to fail in certain configurations. Please see my detailed comment.
Purpose
Follow-up to #26010
endpoint_request_func.py
openai-embeddings-chat
backendopenai-embeddings-clip
andopenai-embeddings-vlm2vec
backends.sampling_params
being passed toopenai-embeddings
backendLoading chat template fallback
log spambenchmarks.md
benchmarks.md
VLLM_USE_V1
cc @maxdebayser @noooop @ZJY0516
Test Plan
Run each newly added example
Test Result
CLIP example:
VLM2Vec example:
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.