Skip to content

[Bugfix] Fix Dense module loading for sentence-transformers embedding models v3 #22614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

FFFfff1FFFfff
Copy link

@FFFfff1FFFfff FFFfff1FFFfff commented Aug 11, 2025

Purpose:
This PR adds automatic support for Sentence-Transformers Dense projection layers in vLLM, enabling proper handling of models that require dimension transformation (e.g., 1024→1792) during embedding generation.

Resolves the following issues:

  • Missing Dense projection functionality for ST models in vLLM
  • Incorrect output dimensions (1024 instead of expected 1792 for models like TencentBAC/Conan-embedding-v1)
  • Ensures numerical consistency with HuggingFace Sentence-Transformers implementation

Key Modifications:

  • vllm/transformers_utils/config.py: Added get_hf_file_bytes() for reading model files
  • vllm/model_executor/models/adapters.py: Core ST projector detection and loading logic
  • vllm/model_executor/layers/pooler.py: Integration of projector into embedding pipeline
  • vllm/model_executor/models/bert.py: Model-specific projector initialization
  • tests/models/language/pooling/test_st_projector.py: Comprehensive test suite (merged and optimized)

Test Plan:
python -m pytest tests/models/language/pooling/test_st_projector.py -v
python -m pytest tests/model_executor/test_st_projector_unit.py -v

Test Result:
tests/models/language/pooling/test_st_projector.py::test_st_projector_loading PASSED
tests/models/language/pooling/test_st_projector.py::test_compare_with_hf_dimensions PASSED
tests/models/language/pooling/test_st_projector.py::test_embedding_numerical_similarity PASSED
tests/models/language/pooling/test_st_projector.py::test_embedding_quality_checks PASSED
tests/models/language/pooling/test_st_projector.py::test_non_projector_models_mteb PASSED

Average cosine similarity: 1.000000 (100% match with HuggingFace)
Embedding dimensions match: 1792
✓ All numerical similarity tests passed!

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Sentence-Transformers Dense projection layers, which is a great enhancement for handling a wider range of embedding models. The implementation looks solid, with good test coverage for both unit and integration scenarios. My main feedback concerns error handling in the new file loading and weight parsing logic. Using broad except Exception clauses can mask important errors and make debugging difficult. I've suggested replacing them with more specific exception handling and logging to improve robustness and maintainability.

Comment on lines +50 to +58
try:
with torch.no_grad():
# Ensure weights are float32 for numerical stability
linear.weight.copy_(weight.to(torch.float32))
if use_bias and bias is not None and linear.bias is not None:
linear.bias.copy_(bias.to(torch.float32))
return True
except Exception:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The broad except Exception can mask errors like shape mismatches when copying weights, which would be better to raise to the user. Silently returning False can lead to models with partially or incorrectly loaded weights without any warning.

Suggested change
try:
with torch.no_grad():
# Ensure weights are float32 for numerical stability
linear.weight.copy_(weight.to(torch.float32))
if use_bias and bias is not None and linear.bias is not None:
linear.bias.copy_(bias.to(torch.float32))
return True
except Exception:
return False
try:
with torch.no_grad():
# Ensure weights are float32 for numerical stability
linear.weight.copy_(weight.to(torch.float32))
if use_bias and bias is not None and linear.bias is not None:
linear.bias.copy_(bias.to(torch.float32))
return True
except RuntimeError as e:
logger.warning("Failed to load weights into linear layer: %s", e)
return False

Comment on lines +66 to +77
try:
b = get_hf_file_bytes(f"{folder}/model.safetensors", model_path,
revision)
if b is not None:
import io

from safetensors.torch import load as st_load
sd = st_load(io.BytesIO(b))
return _load_weights_from_state_dict(sd, linear, use_bias)
except Exception:
pass
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The broad except Exception: pass in _load_from_safetensors (and similarly in _load_from_pytorch_bin) can hide important errors during weight loading, making debugging difficult. It's better to catch specific exceptions related to file I/O or parsing and log them.

Suggested change
try:
b = get_hf_file_bytes(f"{folder}/model.safetensors", model_path,
revision)
if b is not None:
import io
from safetensors.torch import load as st_load
sd = st_load(io.BytesIO(b))
return _load_weights_from_state_dict(sd, linear, use_bias)
except Exception:
pass
return False
try:
b = get_hf_file_bytes(f"{folder}/model.safetensors", model_path,
revision)
if b is not None:
import io
from safetensors.torch import load as st_load
sd = st_load(io.BytesIO(b))
return _load_weights_from_state_dict(sd, linear, use_bias)
except (IOError, ValueError, ImportError) as e:
logger.debug("Failed to load safetensors from %s: %s", folder, e)
pass

Comment on lines +85 to +94
try:
b = get_hf_file_bytes(f"{folder}/pytorch_model.bin", model_path,
revision)
if b is not None:
import io
sd = torch.load(io.BytesIO(b), map_location="cpu")
return _load_weights_from_state_dict(sd, linear, use_bias)
except Exception:
pass
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The broad except Exception: pass can hide important errors during weight loading, making debugging difficult. It's better to catch specific exceptions related to file I/O or parsing and log them. For example, torch.load can raise pickle.UnpicklingError.

Suggested change
try:
b = get_hf_file_bytes(f"{folder}/pytorch_model.bin", model_path,
revision)
if b is not None:
import io
sd = torch.load(io.BytesIO(b), map_location="cpu")
return _load_weights_from_state_dict(sd, linear, use_bias)
except Exception:
pass
return False
try:
b = get_hf_file_bytes(f"{folder}/pytorch_model.bin", model_path,
revision)
if b is not None:
import io
import pickle
sd = torch.load(io.BytesIO(b), map_location="cpu")
return _load_weights_from_state_dict(sd, linear, use_bias)
except (IOError, pickle.UnpicklingError, ValueError) as e:
logger.debug("Failed to load pytorch_model.bin from %s: %s", folder, e)
pass

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link

mergify bot commented Aug 11, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @FFFfff1FFFfff.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Aug 11, 2025
@FFFfff1FFFfff
Copy link
Author

FIX #15509, #16898, #22117
@DarkLight1337 @noooop

Could you please take a look on this Pr, I already revised based on previous review. Thanks a lot!

Copy link
Contributor

@noooop noooop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution

You need to fix conflicts and pre-commit


to fix conflicts, I'm not very familiar with git commands, so I prefer to first roll back the conflicting files, merge main into this PR, and then add the changes back. (The conflicts is solved anyway. LOL

Comment on lines 194 to 205
@pytest.mark.parametrize("model_info", ST_PROJECTOR_MODELS)
@pytest.mark.skip(reason="Projector loading and single-sentence inference verified. MTEB batch processing optimization pending.")
def test_st_projector_models_mteb(hf_runner, vllm_runner,
model_info: EmbedModelInfo) -> None:
"""Test ST models with projectors using MTEB."""
if not model_info.enable_test:
pytest.skip("Skipping test.")
vllm_extra_kwargs: dict[str, Any] = {}
if model_info.architecture == "GteNewModel":
vllm_extra_kwargs["hf_overrides"] = {"architectures": ["GteNewModel"]}
mteb_test_embed_models(hf_runner, vllm_runner, model_info,
vllm_extra_kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just one test is enough

Comment on lines 54 to 66
# Test models without ST projectors (for regression testing)
NON_PROJECTOR_MODELS = [
EmbedModelInfo("thenlper/gte-large",
architecture="BertModel",
enable_test=True),
EmbedModelInfo("Alibaba-NLP/gte-base-en-v1.5",
architecture="GteNewModel",
enable_test=True),
EmbedModelInfo("Qwen/Qwen3-Embedding-0.6B",
architecture="Qwen3ForCausalLM",
dtype="float32",
enable_test=True),
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These models have already been tested in other files

@@ -0,0 +1,156 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mteb_test_embed_models test is sufficient, there is no need for this test.

@FFFfff1FFFfff
Copy link
Author

Got it! I'll sort out the conflicts and push the updates soon. Appreciate it! @noooop

@FFFfff1FFFfff
Copy link
Author

Hi @noooop @DarkLight1337 , I've completed the updates for this PR.
CI checks are still running, but the changes are ready for your review. Thanks a lot!

@@ -38,9 +38,14 @@
RWConfig, SpeculatorsConfig,
Step3TextConfig, Step3VLConfig,
UltravoxConfig)

try:
from vllm.transformers_utils.configs.mllama import MllamaConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have removed both MllamaConfig and NVLM_D_Config, you can remove them from this file

Copy link
Author

@FFFfff1FFFfff FFFfff1FFFfff Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Will fix it! Thanks a lot @noooop @DarkLight1337
I have one more question. I've been running pre-commit locally multiple times before submitting this PR, but it seems the CI's automated pre-commit still catches minor formatting issues (like extra spaces or blank lines) and fails after making modifications. Is there a way to avoid this, or should I run some specific pre-commit configuration to match exactly what CI expects?

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you install pre-commit for this repo?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used pip install pre-commit and pre-commit install. Should I run pre-commit run --all-files --hook-stage manual to catch these CI-only checks locally? Or any other command?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it looks like you did modify this file in your PR. So it's normal that pre-commit would format this file, even though those lines are not from your PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does your local pre-commit remove the line that is added in CI?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I took a try follow your instruction. Seems that pre-commit works this time! Waiting for the remaining pending checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants