[Example] Add fused_linear_cross_entropy example and unit test #342

yf225 · 2025-07-21T07:02:28Z

Stacked PRs:

[Example] Add fused_linear_cross_entropy example and unit test

stack-info: PR: #342, branch: yf225/stack/37

yf225 · 2025-07-21T17:59:18Z

examples/fused_linear_cross_entropy.py

+    """Fused linear + cross entropy."""
+    loss, grad_input, grad_weight, grad_bias = fused_linear_cross_entropy_forward(
+        input_tensor, weight, labels, bias
+    )


Other implementation of fused_linear_cross_entropy in tritonbench also include both the Python-level chunking code and the Triton kernel (per-chunk processing) in the benchmark timing measurement. So we do the same here for a fair comparison.

jansel · 2025-07-22T01:16:18Z

examples/fused_linear_cross_entropy.py

+
+            # Masked load of block
+            mask = block_offsets < vocab_size
+            logits_block = torch.where(


This masking should be added automatically in helion.

jansel · 2025-07-22T01:18:23Z

examples/fused_linear_cross_entropy.py

+        # Process in blocks like Liger
+        for vocab_tile in hl.tile(vocab_size):
+            # Create block offsets (like tl.arange in Triton)
+            block_offsets = vocab_tile.index


Is the alias needed after you remove the extra masking?

jansel · 2025-07-22T01:18:51Z

examples/fused_linear_cross_entropy.py

+    n_total_samples: int,  # Total number of samples for mean reduction
+) -> torch.Tensor:
+    # Grid over samples - each program handles one sample
+    for program_id in hl.grid(chunk_size):


Could this be a hl.tile loop to allow tiling this dimension with block_size>1?

jansel · 2025-07-22T01:19:22Z

examples/fused_linear_cross_entropy.py

+            mask = block_offsets < vocab_size
+
+            # Load block
+            logits_block = torch.where(


Masking should be automatic. Are you sure this needed?

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 21, 2025

yf225 force-pushed the yf225/stack/37 branch from 452205e to 58a9633 Compare July 21, 2025 07:02

yf225 mentioned this pull request Jul 21, 2025

[Benchmark] Add fused_linear_cross_entropy to tritonbench integration #343

Open

[Example] Add fused_linear_cross_entropy example and unit test

4a910ae

stack-info: PR: #342, branch: yf225/stack/37

yf225 force-pushed the yf225/stack/37 branch from 58a9633 to 4a910ae Compare July 21, 2025 08:16

yf225 commented Jul 21, 2025

View reviewed changes

yf225 requested a review from jansel July 21, 2025 17:59

jansel requested changes Jul 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Example] Add fused_linear_cross_entropy example and unit test #342

[Example] Add fused_linear_cross_entropy example and unit test #342

Uh oh!

yf225 commented Jul 21, 2025 •

edited

Loading

Uh oh!

yf225 Jul 21, 2025

Uh oh!

jansel Jul 22, 2025

Uh oh!

jansel Jul 22, 2025

Uh oh!

jansel Jul 22, 2025

Uh oh!

jansel Jul 22, 2025

Uh oh!

Uh oh!

[Example] Add fused_linear_cross_entropy example and unit test #342

Are you sure you want to change the base?

[Example] Add fused_linear_cross_entropy example and unit test #342

Uh oh!

Conversation

yf225 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!