[model, ops] feat: add Qwen3 sequence classification model and loss for embedding classification. #322

yiwzhao · 2025-12-24T01:02:31Z

feat: Add a data collator to support embedding classification.

gemini-code-assist

Code Review

This pull request introduces support for embedding classification by adding new data collators and a sequence classification head for the Qwen3 model. The changes are well-structured, but I've identified significant code duplication in the new data collators in veomni/data/data_collator.py. My review includes suggestions to refactor these classes using inheritance to improve maintainability and reduce redundancy. I also found some dead code that should be removed for clarity.

veomni/data/data_collator.py

CLAassistant · 2026-01-06T01:24:05Z

All committers have signed the CLA.

piyifan123 · 2026-01-06T20:41:12Z

veomni/data/data_collator.py

+
+
+@dataclass
+class ClassificationDataCollatorWithPositionIDs(DataCollator):


split this to another MR

Moved to #376

piyifan123 · 2026-01-06T20:41:41Z

veomni/models/loader.py


    arch_name = get_model_arch_from_config(model_config)
    model_type = model_config.model_type
+    if not force_use_huggingface:


rebase this?

Already rebased.

piyifan123 · 2026-01-06T20:42:23Z

veomni/ops/loss.py

+    **kwargs,
+) -> torch.Tensor:
+    # We don't use shift_labels
+    assert shift_labels is None


assert can be skipped. do not use in production code.

piyifan123 · 2026-01-06T20:43:06Z

veomni/ops/loss.py

+    loss = None
+    logits = None
+
+    if labels is None:


throw exception if label is None.

It has been implemented that a ValueError will be raised if the label is none now.

piyifan123 · 2026-01-06T20:48:54Z

veomni/ops/loss.py

+    return loss, logits
+
+
+def seqcls_token_loss_function(


this file is no longer there. can we follow the new way defined in https://github.com/ByteDance-Seed/VeOmni/blob/main/veomni/ops/fused_cross_entropy/__init__.py

Yes, we can. Implemented using the latest way.

piyifan123 · 2026-01-06T20:50:39Z

veomni/models/transformers/qwen3/modeling_qwen3.py

+        )
+
+        hidden_states = transformer_outputs.last_hidden_state
+        logits = self.score(hidden_states)


remove this. no longer needed. we can just use the one from the loss_function.

piyifan123 · 2026-01-06T20:50:52Z

veomni/models/transformers/qwen3/modeling_qwen3.py

+        **kwargs: Unpack[FlashAttentionKwargs],
+    ) -> SequenceClassifierOutputWithPast:
+        transformer: Qwen3Model = getattr(self, self.base_model_prefix)
+        transformer_outputs: BaseModelOutputWithPast = transformer(


self.model(...)?

It has been revised to a simpler version as suggested.

piyifan123 · 2026-01-06T20:57:12Z

veomni/ops/loss.py

+    labels: torch.Tensor,
+    num_items_in_batch: Optional[int] = None,
+    ignore_index: int = -100,
+    shift_labels: Optional[torch.Tensor] = None,


remove shift_labels. kwargs

piyifan123 · 2026-01-06T21:00:19Z

tests/data/test_seqcls_loss.py

+
+    loss, logits = m.seqcls_token_loss_function(hidden_states, weight, labels=labels, ignore_index=-100)
+
+    assert loss is not None


https://docs.google.com/document/d/13k8AsgYdF-2TJx9QIHHvPiuV3JMbsrMLkWmmjdYBLVg/edit?tab=t.0

The unit tests for the loss function have been adjusted according to this document

piyifan123 · 2026-01-06T21:02:07Z

veomni/ops/loss.py

+    ignore_index: int = -100,
+    shift_labels: Optional[torch.Tensor] = None,
+    **kwargs,
+) -> torch.Tensor:


add docstring.

…t.py to loss.py

Coach257 · 2026-01-07T08:48:26Z

veomni/models/transformers/qwen3/modeling_qwen3.py

+                weights=self.score.weight,
+                **kwargs,
+            )
+        else:


what if inference task

Added logic to calculate logits when no labels are provided, for compatibility with inference tasks.

…with inference scenarios.

Luosuu · 2026-01-07T20:51:05Z

tests/ops/test_seqcls_loss.py

@@ -0,0 +1,210 @@
+import math


tbh I dont understand what this test does..

It verifies that the seq-classification loss uses the right function, handles masking and SP correctly, and produces the exact cross-entropy value expected from a manual calculation. Currently, this only involves manually constructing test cases and verifying whether the loss value is calculated correctly. Perhaps in the future, we can add a real test within the trainer.

veomni/ops/fused_cross_entropy/__init__.py

piyifan123

@Coach257 do you want to take another look?

veomni/models/loader.py

piyifan123 · 2026-01-08T01:34:20Z

veomni/ops/fused_cross_entropy/__init__.py

+    hidden_states = kwargs.pop("hidden_states", None)
+    weights = kwargs.pop("weights", None)
+
+    assert hidden_states is not None or logits is not None, "hidden_states or logits must be provided."


https://google.github.io/styleguide/pyguide.html#24-exceptions

Replaced asserts with explicit ValueError.

piyifan123 · 2026-01-08T01:37:32Z

tests/ops/test_seqcls_loss.py

+    """
+    device = torch.device("cuda")
+    monkeypatch.setattr(m, "get_parallel_state", lambda: _FakePS(sp_enabled=False))
+    ignore = -100


import veomni constant IGNORE_INDEX

piyifan123 · 2026-01-08T01:38:06Z

tests/ops/test_seqcls_loss.py

+        hidden_states=hidden_states,
+        weights=weights,
+    )
+    expected = math.log(float(3))


can we just write down the one line torch command to do the matmul + softmax + cross entropy

piyifan123 · 2026-01-08T01:38:26Z

tests/ops/test_seqcls_loss.py

+    logits = torch.zeros((1, 2, 3), device=device)
+    labels = torch.tensor([[ignore, 1]], device=device)
+    hidden_states = torch.zeros((1, 2, 5), device=device)
+    weights = torch.zeros((3, 5), device=device)


can we make it a matrix.

Yes, sure. Updated.

piyifan123 · 2026-01-08T01:41:01Z

tests/ops/test_seqcls_loss.py

@@ -0,0 +1,210 @@
+import math


check if we need to add tests/ops folder to https://github.com/ByteDance-Seed/VeOmni/blob/main/.github/workflows/gpu_unit_tests.yml

Yes, we need it. I added the tests/ops directory.

* [docs] feat: add async doc in ulysses.md (#388) * [model] fix: Fused operator fix for qwen3vl (#378) * [perf, dist] feat: add zero2 in fsdp1 and use_orig_params configurable (#382) * [data,ci,docs] feat: add torchcodec-based video processing with ffmpeg support and comprehensive testing (#221) * [data,ci] test: enhance video_utils test suite with robust validation and benchmarks (#375) * [data, model] feat: support Qwen3-VL textual token-based time encoding (#386) * [config] feat: add MFU calculation for qwen3_vl_moe (#385) * [docs] fix: Optimize document links in Markdown rendering (#380) * [model, ops] feat: add Qwen3 sequence classification model and loss for embedding classification. (#322) * [dist, data] fix: init parallel state in data collator post init to avoid worker processing getting single process state (#383) See merge request: !78

yiwzhao and others added 13 commits December 5, 2025 01:33

For Qwen3-Embedding series, override architectures to Qwen3Model

225a501

fix precommit check

f8266fa

support AutoModelForSequenceClassification

44a4fbb

Resolve merge conflict in loader.py

a97e5e0

use arch_name instead of architecture

5db871d

add support for Qwen3ForSequenceClassification

3f51582

Optimize the load_class logic.

4e5579c

Abstract the loss and logits generation parts into separate functions

450fd9d

remove useless parts

d4e7381

Merge branch 'ByteDance-Seed:main' into yiwen/seqclass

4c6e202

define ignore_index

b2e1ffa

update the model definition method

74daec0

update data collator to support embedding classification

2fcd87b

gemini-code-assist bot reviewed Dec 24, 2025

View reviewed changes

veomni/data/data_collator.py Outdated Show resolved Hide resolved

veomni/data/data_collator.py Outdated Show resolved Hide resolved

veomni/data/data_collator.py Outdated Show resolved Hide resolved

yiwzhao marked this pull request as draft December 24, 2025 02:07

update loss function

99f7c42

yiwzhao marked this pull request as ready for review January 6, 2026 01:04

yiwzhao force-pushed the yiwen/data_collator branch from 99f7c42 to bd6e36f Compare January 6, 2026 01:23

yiwzhao marked this pull request as draft January 6, 2026 01:30

yiwzhao force-pushed the yiwen/data_collator branch from bd6e36f to 99f7c42 Compare January 6, 2026 01:38

yiwzhao marked this pull request as ready for review January 6, 2026 01:39

yiwzhao added 2 commits January 6, 2026 03:49

add unit tests of data collators

1a29bf8

add unit tests of loss function

d9e8c74

yiwzhao mentioned this pull request Jan 6, 2026

[model] feat: add Qwen3ForSequenceClassification to support sequence embedding classification #317

Closed

yiwzhao added 2 commits January 6, 2026 19:50

replace ignore_index

d048bfc

remove useless parameters

08b8bba

piyifan123 reviewed Jan 6, 2026

View reviewed changes

yiwzhao added 2 commits January 7, 2026 00:03

Merge remote-tracking branch 'upstream/main' into yiwen/data_collator

938efb8

add ForSequenceClassificationLoss and remove ForCausalLMLoss from ini…

3ace9a8

…t.py to loss.py

yiwzhao added 8 commits January 7, 2026 00:23

remove data collator

e48b850

remove test of data collator

8347fcb

add docstring

34e7c0c

polish the docstring

8d47caa

add docstring

0c057ef

add unit test

4fd6e64

remove example

ecf4e9e

polish docstring

016bc09

yiwzhao changed the title ~~feat: Add a data collator to support embedding classification.~~ feat: Add class Qwen3ForSequenceClassification and loss function to support embedding classification. Jan 7, 2026

yiwzhao mentioned this pull request Jan 7, 2026

[data] feat: add data collators for embedding classification. #376

Merged

yiwzhao changed the title ~~feat: Add class Qwen3ForSequenceClassification and loss function to support embedding classification.~~ [model, loss] feat: Add class Qwen3ForSequenceClassification and loss function to support embedding classification. Jan 7, 2026

yiwzhao changed the title ~~[model, loss] feat: Add class Qwen3ForSequenceClassification and loss function to support embedding classification.~~ [model, loss] feat: add Qwen3 sequence classification model and loss for embedding classification. Jan 7, 2026

yiwzhao changed the title ~~[model, loss] feat: add Qwen3 sequence classification model and loss for embedding classification.~~ [model, ops] feat: add Qwen3 sequence classification model and loss for embedding classification. Jan 7, 2026

Coach257 reviewed Jan 7, 2026

View reviewed changes

yiwzhao and others added 2 commits January 7, 2026 20:05

Calculate logits when no labels are provided to ensure compatibility …

d731cfa

…with inference scenarios.

Merge branch 'main' into yiwen/data_collator

c6001b9

Luosuu reviewed Jan 7, 2026

View reviewed changes

Enable model parallelism

62e7037

piyifan123 approved these changes Jan 8, 2026

View reviewed changes

piyifan123 reviewed Jan 8, 2026

View reviewed changes

yiwzhao added 6 commits January 8, 2026 21:44

replace assert

89e512a

update one test case

a110461

add tests/ops folder

18df491

update device load way

158af88

update docstring (add return)

e642861

update error type

06626ee

yiwzhao merged commit 82e754a into ByteDance-Seed:main Jan 9, 2026
12 checks passed



		@dataclass
		class ClassificationDataCollatorWithPositionIDs(DataCollator):


		loss, logits = m.seqcls_token_loss_function(hidden_states, weight, labels=labels, ignore_index=-100)

		assert loss is not None

[model, ops] feat: add Qwen3 sequence classification model and loss for embedding classification. #322

[model, ops] feat: add Qwen3 sequence classification model and loss for embedding classification. #322

Uh oh!

Conversation

yiwzhao commented Dec 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CLAassistant commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiwzhao Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

piyifan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

CLAassistant commented Jan 6, 2026 •

edited

Loading

yiwzhao Jan 7, 2026 •

edited

Loading