Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

Summary

  • Refactor block min/max calculations to utilize Triton's minimum and maximum functions.

Root Cause

  • The original implementation used Python's built-in min and max functions, which may not be optimal for performance in a Triton context.

Changes

  • Replaced instances of min and max with Triton's tl.minimum and tl.maximum functions in the block min/max calculation functions.

Reproduction

  • The issue can be reproduced by running block min/max calculations in the Triton environment with varying input sizes.

Tests

  • Validated changes by running existing tests that cover block min/max calculations.

Compatibility

  • No backward compatibility issues identified.

Checklist

  • Linked issue provided
  • Adds or updates tests
  • Updates docs if needed
  • No perf regressions

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors block min/max calculations in Triton kernels to use Triton's native tl.minimum and tl.maximum functions instead of Python's built-in min and max functions, improving performance within the Triton compilation context.

Changes:

  • Replaced all instances of Python's min/max with tl.minimum/tl.maximum in block calculation functions
  • Applied changes consistently across all four Triton JIT-compiled functions in the file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Improves masking flexibility by supporting seqlen, causal, and local windows while guarding unsupported configurations for swapped dimensions and packed GQA.
@LoserCheems LoserCheems merged commit 7c65745 into main Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants