Skip to content

Comments

feat(model): add embed_sparse task for BGE-M3 server-side sparse aggr…#35001

Open
joeqzzuo wants to merge 3 commits intovllm-project:mainfrom
joeqzzuo:feat/bge-m3-embed-sparse
Open

feat(model): add embed_sparse task for BGE-M3 server-side sparse aggr…#35001
joeqzzuo wants to merge 3 commits intovllm-project:mainfrom
joeqzzuo:feat/bge-m3-embed-sparse

Conversation

@joeqzzuo
Copy link

@joeqzzuo joeqzzuo commented Feb 20, 2026

…egation

Purpose

Add embed_sparse pooling task for BgeM3EmbeddingModel to enable server-side sparse vector aggregation.

Currently, BGE-M3 sparse retrieval requires a cumbersome 2-step client workflow:

  1. Call /tokenize to get token IDs
  2. Call /pooling with task=token_classify to get per-position scores, then manually aggregate via scatter_reduce on the client side

This PR adds a BgeM3SparsePooler that performs scatter_reduce(index=input_ids, reduce="amax") aggregation server-side, producing vocabulary-sized sparse vectors directly usable by vector databases (Qdrant, Milvus, Vespa, etc.) — in a single API call.

This follows the same pattern as SPLADESparsePooler in bert.py, adapted for BGE-M3's architecture where sparse_linear maps hidden states to a single scalar per position (rather than SPLADE's MLM head which maps to vocab-sized logits).

Related issues: #13609 #15384 #18469

Test Plan

pytest tests/models/language/pooling/test_bge_m3.py -v -k "embed_sparse"

Three new integration tests:

  • test_bge_m3_embed_sparse_matches_token_classify — verifies embed_sparse output matches token_classify + client-side aggregation
  • test_bge_m3_embed_sparse_lexical_scores — verifies lexical matching scores match reference values
  • test_bge_m3_embed_sparse_corner_case — verifies short input ("Hi") produces correct sparse output

Test Result

Unit tests (standalone, validating core scatter_reduce logic):

BgeM3SparsePooler Unit Tests
Test 1: scatter_reduce matches dict aggregation... PASSED
Test 2: duplicate tokens take max weight... PASSED
Test 3: special tokens zeroed out... PASSED
Test 4: single token (BOS only) → all zeros... PASSED
Test 5: output shape = [vocab_size] and mostly sparse... PASSED
Test 6: batch independence... PASSED
Test 7: lexical matching score via sparse vectors... PASSED
Test 8: ReLU zeros negative weights... PASSED
ALL 8 TESTS PASSED

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@joeqzzuo joeqzzuo requested a review from noooop as a code owner February 20, 2026 23:33
@dosubot
Copy link

dosubot bot commented Feb 20, 2026

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new embed_sparse pooling task for the BgeM3EmbeddingModel, enabling server-side sparse vector aggregation. This is a significant improvement, streamlining the process of generating vocabulary-sized sparse vectors directly usable by vector databases. The changes include adding a BgeM3SparsePooler class, updating the BgeM3EmbeddingModel to incorporate this new pooler, and modifying vllm/tasks.py to include embed_sparse as a PoolingTask. New integration tests have also been added to verify the functionality and correctness of the embed_sparse task. The code is generally well-structured and follows existing patterns within the codebase. I've identified a few areas for improvement related to type hinting and potential clarity in the BgeM3SparsePooler initialization.


import itertools
from collections.abc import Iterable
from collections.abc import Iterable, Set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Set import is used in BgeM3SparsePooler's get_supported_tasks method. It's good practice to import Set directly from typing rather than collections.abc for better compatibility and type checking across Python versions, especially when dealing with type hints.

Suggested change
from collections.abc import Iterable, Set
from collections.abc import Iterable
from typing import Set

self,
sparse_linear: nn.Module,
vocab_size: int,
special_token_ids: list[int],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The special_token_ids parameter is typed as list[int]. While this is technically correct, Set[int] might be more appropriate here given that the order of special tokens doesn't matter and checking for membership (in self.special_token_ids) would be more efficient with a set. This also clearly communicates the intent that special_token_ids is a collection of unique IDs.

Suggested change
special_token_ids: list[int],
special_token_ids: Set[int],

Comment on lines +246 to +247
if self.special_token_ids:
sparse_vec[self.special_token_ids] = 0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The special_token_ids attribute is initialized as a list but used in a context where Set operations would be more efficient (checking if self.special_token_ids). Converting it to a set during initialization would improve lookup performance, especially if the list of special tokens grows large.

Suggested change
if self.special_token_ids:
sparse_vec[self.special_token_ids] = 0.0
if self.special_token_ids:
sparse_vec[list(self.special_token_ids)] = 0.0

self.bos_token_id,
self.eos_token_id,
self.pad_token_id,
] if tid is not None and tid >= 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The special_token_ids list comprehension currently filters out None values. However, getattr(hf_config, "pad_token_id", 1) ensures pad_token_id is always an int. If bos_token_id or eos_token_id can truly be None, then the type hint for special_token_ids in BgeM3SparsePooler should reflect list[int | None] or the filtering logic should be more explicit about handling None if tid is not None is intended to handle more than just the pad_token_id default. Given the BgeM3SparsePooler expects list[int], it's safer to ensure all elements are ints before passing them.

Suggested change
] if tid is not None and tid >= 0
] if tid is not None and tid >= 0 and isinstance(tid, int)
]

@mergify
Copy link

mergify bot commented Feb 20, 2026

Hi @joeqzzuo, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@DarkLight1337 DarkLight1337 removed documentation Improvements or additions to documentation performance Performance-related issues new-model Requests to new models rocm Related to AMD ROCm speculative-decoding ci/build v1 multi-modality Related to multi-modality (#4194) llama Related to Llama models qwen Related to Qwen models gpt-oss Related to GPT-OSS models kv-connector nvidia labels Feb 21, 2026
@DarkLight1337 DarkLight1337 removed this from NVIDIA Feb 21, 2026
@DarkLight1337 DarkLight1337 removed this from AMD Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants