feat(model): add embed_sparse task for BGE-M3 server-side sparse aggr… by joeqzzuo · Pull Request #35001 · vllm-project/vllm

joeqzzuo · 2026-02-20T23:33:46Z

…egation

Purpose

Add embed_sparse pooling task for BgeM3EmbeddingModel to enable server-side sparse vector aggregation.

Currently, BGE-M3 sparse retrieval requires a cumbersome 2-step client workflow:

Call /tokenize to get token IDs
Call /pooling with task=token_classify to get per-position scores, then manually aggregate via scatter_reduce on the client side

This PR adds a BgeM3SparsePooler that performs scatter_reduce(index=input_ids, reduce="amax") aggregation server-side, producing vocabulary-sized sparse vectors directly usable by vector databases (Qdrant, Milvus, Vespa, etc.) — in a single API call.

This follows the same pattern as SPLADESparsePooler in bert.py, adapted for BGE-M3's architecture where sparse_linear maps hidden states to a single scalar per position (rather than SPLADE's MLM head which maps to vocab-sized logits).

Related issues: #13609 #15384 #18469

Test Plan

pytest tests/models/language/pooling/test_bge_m3.py -v -k "embed_sparse"

Three new integration tests:

test_bge_m3_embed_sparse_matches_token_classify — verifies embed_sparse output matches token_classify + client-side aggregation
test_bge_m3_embed_sparse_lexical_scores — verifies lexical matching scores match reference values
test_bge_m3_embed_sparse_corner_case — verifies short input ("Hi") produces correct sparse output

Test Result

Unit tests (standalone, validating core scatter_reduce logic):

BgeM3SparsePooler Unit Tests
Test 1: scatter_reduce matches dict aggregation... PASSED
Test 2: duplicate tokens take max weight... PASSED
Test 3: special tokens zeroed out... PASSED
Test 4: single token (BOS only) → all zeros... PASSED
Test 5: output shape = [vocab_size] and mostly sparse... PASSED
Test 6: batch independence... PASSED
Test 7: lexical matching score via sparse vectors... PASSED
Test 8: ReLU zeros negative weights... PASSED
ALL 8 TESTS PASSED

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

dosubot · 2026-02-20T23:33:55Z

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

github-actions · 2026-02-20T23:33:56Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

The pull request introduces a new embed_sparse pooling task for the BgeM3EmbeddingModel, enabling server-side sparse vector aggregation. This is a significant improvement, streamlining the process of generating vocabulary-sized sparse vectors directly usable by vector databases. The changes include adding a BgeM3SparsePooler class, updating the BgeM3EmbeddingModel to incorporate this new pooler, and modifying vllm/tasks.py to include embed_sparse as a PoolingTask. New integration tests have also been added to verify the functionality and correctness of the embed_sparse task. The code is generally well-structured and follows existing patterns within the codebase. I've identified a few areas for improvement related to type hinting and potential clarity in the BgeM3SparsePooler initialization.

gemini-code-assist · 2026-02-20T23:35:07Z

vllm/model_executor/models/roberta.py


 import itertools
-from collections.abc import Iterable
+from collections.abc import Iterable, Set


The Set import is used in BgeM3SparsePooler's get_supported_tasks method. It's good practice to import Set directly from typing rather than collections.abc for better compatibility and type checking across Python versions, especially when dealing with type hints.

Suggested change

from collections.abc import Iterable, Set

from collections.abc import Iterable

from typing import Set

gemini-code-assist · 2026-02-20T23:35:07Z

vllm/model_executor/models/roberta.py

+        self,
+        sparse_linear: nn.Module,
+        vocab_size: int,
+        special_token_ids: list[int],


The special_token_ids parameter is typed as list[int]. While this is technically correct, Set[int] might be more appropriate here given that the order of special tokens doesn't matter and checking for membership (in self.special_token_ids) would be more efficient with a set. This also clearly communicates the intent that special_token_ids is a collection of unique IDs.

Suggested change

special_token_ids: list[int],

special_token_ids: Set[int],

gemini-code-assist · 2026-02-20T23:35:07Z

vllm/model_executor/models/roberta.py

+            if self.special_token_ids:
+                sparse_vec[self.special_token_ids] = 0.0


The special_token_ids attribute is initialized as a list but used in a context where Set operations would be more efficient (checking if self.special_token_ids). Converting it to a set during initialization would improve lookup performance, especially if the list of special tokens grows large.

Suggested change

if self.special_token_ids:

sparse_vec[self.special_token_ids] = 0.0

if self.special_token_ids:

sparse_vec[list(self.special_token_ids)] = 0.0

gemini-code-assist · 2026-02-20T23:35:07Z

vllm/model_executor/models/roberta.py

+                self.bos_token_id,
+                self.eos_token_id,
+                self.pad_token_id,
+            ] if tid is not None and tid >= 0


The special_token_ids list comprehension currently filters out None values. However, getattr(hf_config, "pad_token_id", 1) ensures pad_token_id is always an int. If bos_token_id or eos_token_id can truly be None, then the type hint for special_token_ids in BgeM3SparsePooler should reflect list[int | None] or the filtering logic should be more explicit about handling None if tid is not None is intended to handle more than just the pad_token_id default. Given the BgeM3SparsePooler expects list[int], it's safer to ensure all elements are ints before passing them.

Suggested change

] if tid is not None and tid >= 0

] if tid is not None and tid >= 0 and isinstance(tid, int)

]

mergify · 2026-02-20T23:38:19Z

Hi @joeqzzuo, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

joeqzzuo requested a review from noooop as a code owner February 20, 2026 23:33

gemini-code-assist bot reviewed Feb 20, 2026

View reviewed changes

joeqzzuo force-pushed the feat/bge-m3-embed-sparse branch from ec3d4cb to b615d75 Compare February 21, 2026 11:29

DarkLight1337 removed request for 22quinn, ApostaC, WoosukKwon, aarnphm, alexm-redhat, heheda12345, hmellor, houseroad, pavanimajety, robertgshaw2-redhat, tjtanaa, yewentao256, youkaichao and ywang96 February 21, 2026 15:09

DarkLight1337 removed this from NVIDIA Feb 21, 2026

DarkLight1337 removed this from gpt-oss Issues & Enhancements Feb 21, 2026

DarkLight1337 removed this from AMD Feb 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

feat(model): add embed_sparse task for BGE-M3 server-side sparse aggr…#35001

feat(model): add embed_sparse task for BGE-M3 server-side sparse aggr…#35001
joeqzzuo wants to merge 3 commits intovllm-project:mainfrom
joeqzzuo:feat/bge-m3-embed-sparse

joeqzzuo commented Feb 20, 2026 •

edited by github-actions bot

Loading

Uh oh!

dosubot bot commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 20, 2026

Uh oh!

gemini-code-assist bot Feb 20, 2026

Uh oh!

gemini-code-assist bot Feb 20, 2026

Uh oh!

gemini-code-assist bot Feb 20, 2026

Uh oh!

mergify bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	from collections.abc import Iterable, Set
	from collections.abc import Iterable
	from typing import Set

		if self.special_token_ids:
		sparse_vec[self.special_token_ids] = 0.0

	] if tid is not None and tid >= 0
	] if tid is not None and tid >= 0 and isinstance(tid, int)
	]

Uh oh!

Comments

Conversation

joeqzzuo commented Feb 20, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

dosubot bot commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joeqzzuo commented Feb 20, 2026 •

edited by github-actions bot

Loading