-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
[Bugfix] Fix Dense module loading for sentence-transformers embedding models v3 #22614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for Sentence-Transformers Dense projection layers, which is a great enhancement for handling a wider range of embedding models. The implementation looks solid, with good test coverage for both unit and integration scenarios. My main feedback concerns error handling in the new file loading and weight parsing logic. Using broad except Exception
clauses can mask important errors and make debugging difficult. I've suggested replacing them with more specific exception handling and logging to improve robustness and maintainability.
try: | ||
with torch.no_grad(): | ||
# Ensure weights are float32 for numerical stability | ||
linear.weight.copy_(weight.to(torch.float32)) | ||
if use_bias and bias is not None and linear.bias is not None: | ||
linear.bias.copy_(bias.to(torch.float32)) | ||
return True | ||
except Exception: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The broad except Exception
can mask errors like shape mismatches when copying weights, which would be better to raise to the user. Silently returning False
can lead to models with partially or incorrectly loaded weights without any warning.
try: | |
with torch.no_grad(): | |
# Ensure weights are float32 for numerical stability | |
linear.weight.copy_(weight.to(torch.float32)) | |
if use_bias and bias is not None and linear.bias is not None: | |
linear.bias.copy_(bias.to(torch.float32)) | |
return True | |
except Exception: | |
return False | |
try: | |
with torch.no_grad(): | |
# Ensure weights are float32 for numerical stability | |
linear.weight.copy_(weight.to(torch.float32)) | |
if use_bias and bias is not None and linear.bias is not None: | |
linear.bias.copy_(bias.to(torch.float32)) | |
return True | |
except RuntimeError as e: | |
logger.warning("Failed to load weights into linear layer: %s", e) | |
return False |
try: | ||
b = get_hf_file_bytes(f"{folder}/model.safetensors", model_path, | ||
revision) | ||
if b is not None: | ||
import io | ||
|
||
from safetensors.torch import load as st_load | ||
sd = st_load(io.BytesIO(b)) | ||
return _load_weights_from_state_dict(sd, linear, use_bias) | ||
except Exception: | ||
pass | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The broad except Exception: pass
in _load_from_safetensors
(and similarly in _load_from_pytorch_bin
) can hide important errors during weight loading, making debugging difficult. It's better to catch specific exceptions related to file I/O or parsing and log them.
try: | |
b = get_hf_file_bytes(f"{folder}/model.safetensors", model_path, | |
revision) | |
if b is not None: | |
import io | |
from safetensors.torch import load as st_load | |
sd = st_load(io.BytesIO(b)) | |
return _load_weights_from_state_dict(sd, linear, use_bias) | |
except Exception: | |
pass | |
return False | |
try: | |
b = get_hf_file_bytes(f"{folder}/model.safetensors", model_path, | |
revision) | |
if b is not None: | |
import io | |
from safetensors.torch import load as st_load | |
sd = st_load(io.BytesIO(b)) | |
return _load_weights_from_state_dict(sd, linear, use_bias) | |
except (IOError, ValueError, ImportError) as e: | |
logger.debug("Failed to load safetensors from %s: %s", folder, e) | |
pass |
try: | ||
b = get_hf_file_bytes(f"{folder}/pytorch_model.bin", model_path, | ||
revision) | ||
if b is not None: | ||
import io | ||
sd = torch.load(io.BytesIO(b), map_location="cpu") | ||
return _load_weights_from_state_dict(sd, linear, use_bias) | ||
except Exception: | ||
pass | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The broad except Exception: pass
can hide important errors during weight loading, making debugging difficult. It's better to catch specific exceptions related to file I/O or parsing and log them. For example, torch.load
can raise pickle.UnpicklingError
.
try: | |
b = get_hf_file_bytes(f"{folder}/pytorch_model.bin", model_path, | |
revision) | |
if b is not None: | |
import io | |
sd = torch.load(io.BytesIO(b), map_location="cpu") | |
return _load_weights_from_state_dict(sd, linear, use_bias) | |
except Exception: | |
pass | |
return False | |
try: | |
b = get_hf_file_bytes(f"{folder}/pytorch_model.bin", model_path, | |
revision) | |
if b is not None: | |
import io | |
import pickle | |
sd = torch.load(io.BytesIO(b), map_location="cpu") | |
return _load_weights_from_state_dict(sd, linear, use_bias) | |
except (IOError, pickle.UnpicklingError, ValueError) as e: | |
logger.debug("Failed to load pytorch_model.bin from %s: %s", folder, e) | |
pass |
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request has merge conflicts that must be resolved before it can be |
FIX #15509, #16898, #22117 Could you please take a look on this Pr, I already revised based on previous review. Thanks a lot! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution
You need to fix conflicts and pre-commit
to fix conflicts, I'm not very familiar with git commands, so I prefer to first roll back the conflicting files, merge main into this PR, and then add the changes back. (The conflicts is solved anyway. LOL
@pytest.mark.parametrize("model_info", ST_PROJECTOR_MODELS) | ||
@pytest.mark.skip(reason="Projector loading and single-sentence inference verified. MTEB batch processing optimization pending.") | ||
def test_st_projector_models_mteb(hf_runner, vllm_runner, | ||
model_info: EmbedModelInfo) -> None: | ||
"""Test ST models with projectors using MTEB.""" | ||
if not model_info.enable_test: | ||
pytest.skip("Skipping test.") | ||
vllm_extra_kwargs: dict[str, Any] = {} | ||
if model_info.architecture == "GteNewModel": | ||
vllm_extra_kwargs["hf_overrides"] = {"architectures": ["GteNewModel"]} | ||
mteb_test_embed_models(hf_runner, vllm_runner, model_info, | ||
vllm_extra_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just one test is enough
# Test models without ST projectors (for regression testing) | ||
NON_PROJECTOR_MODELS = [ | ||
EmbedModelInfo("thenlper/gte-large", | ||
architecture="BertModel", | ||
enable_test=True), | ||
EmbedModelInfo("Alibaba-NLP/gte-base-en-v1.5", | ||
architecture="GteNewModel", | ||
enable_test=True), | ||
EmbedModelInfo("Qwen/Qwen3-Embedding-0.6B", | ||
architecture="Qwen3ForCausalLM", | ||
dtype="float32", | ||
enable_test=True), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These models have already been tested in other files
@@ -0,0 +1,156 @@ | |||
# SPDX-License-Identifier: Apache-2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mteb_test_embed_models test is sufficient, there is no need for this test.
Got it! I'll sort out the conflicts and push the updates soon. Appreciate it! @noooop |
0428a14
to
0e3d5f7
Compare
Hi @noooop @DarkLight1337 , I've completed the updates for this PR. |
vllm/transformers_utils/config.py
Outdated
@@ -38,9 +38,14 @@ | |||
RWConfig, SpeculatorsConfig, | |||
Step3TextConfig, Step3VLConfig, | |||
UltravoxConfig) | |||
|
|||
try: | |||
from vllm.transformers_utils.configs.mllama import MllamaConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have removed both MllamaConfig
and NVLM_D_Config
, you can remove them from this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Will fix it! Thanks a lot @noooop @DarkLight1337
I have one more question. I've been running pre-commit locally multiple times before submitting this PR, but it seems the CI's automated pre-commit still catches minor formatting issues (like extra spaces or blank lines) and fails after making modifications. Is there a way to avoid this, or should I run some specific pre-commit configuration to match exactly what CI expects?

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you install pre-commit for this repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used pip install pre-commit
and pre-commit install
. Should I run pre-commit run --all-files --hook-stage manual
to catch these CI-only checks locally? Or any other command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it looks like you did modify this file in your PR. So it's normal that pre-commit would format this file, even though those lines are not from your PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does your local pre-commit
remove the line that is added in CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! I took a try follow your instruction. Seems that pre-commit works this time! Waiting for the remaining pending checks.
349c111
to
572b147
Compare
… models v4 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v5 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v6 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v8 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v10 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v11 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v11 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v11 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v12 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v12 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v12 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v12 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v12 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v12 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v12 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v12 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v12 Signed-off-by: FFFfff1FFFfff <[email protected]>
… models v12 Signed-off-by: FFFfff1FFFfff <[email protected]>
572b147
to
77ed950
Compare
Purpose:
This PR adds automatic support for Sentence-Transformers Dense projection layers in vLLM, enabling proper handling of models that require dimension transformation (e.g., 1024→1792) during embedding generation.
Resolves the following issues:
Key Modifications:
Test Plan:
python -m pytest tests/models/language/pooling/test_st_projector.py -v
python -m pytest tests/model_executor/test_st_projector_unit.py -v
Test Result:
tests/models/language/pooling/test_st_projector.py::test_st_projector_loading PASSED
tests/models/language/pooling/test_st_projector.py::test_compare_with_hf_dimensions PASSED
tests/models/language/pooling/test_st_projector.py::test_embedding_numerical_similarity PASSED
tests/models/language/pooling/test_st_projector.py::test_embedding_quality_checks PASSED
tests/models/language/pooling/test_st_projector.py::test_non_projector_models_mteb PASSED
Average cosine similarity: 1.000000 (100% match with HuggingFace)
Embedding dimensions match: 1792
✓ All numerical similarity tests passed!