Add utility functions for device management and input validation by LoserCheems · Pull Request #225 · HKUSTDial/flash-sparse-attention

LoserCheems · 2026-02-01T12:09:16Z

Summary

Introduces utility functions for managing devices and validating tensor inputs.

Root Cause

Enhancements to improve clarity and consistency in device management and input validation.

Changes

Added functions for device detection, architecture retrieval, and input validation.
Refactored autotuning configuration to utilize new utility functions.
Initialized output tensors to zero for proper handling in the forward base kernel.

Reproduction

No specific bug to reproduce; enhancements made for better functionality.

Tests

Validated functionality through existing tests and ensured no regressions.

Compatibility

No backward compatibility issues; new functions added without altering existing interfaces.

Checklist

Linked issue provided [FEATURE REQUEST] Triton-based efficient multi-platform, multi-variant attention #222
Adds or updates tests
Updates docs if needed
No perf regressions

…function

…s input handling for improved clarity and consistency

Copilot

Pull request overview

This PR introduces utility functions for device management, input validation, and autotuning configuration to support a Triton-based multi-platform attention implementation (related to issue #222). The changes split the monolithic _flash_attn_forward function into separate base and varlen variants while extracting common utilities.

Changes:

Added a new utils.py module with device detection, architecture retrieval, autotuning configuration generation, and input validation functions
Refactored flash_fwd.py to use the new utility functions and split the forward pass into _flash_attn_base_forward and _flash_attn_varlen_base_forward functions
Enabled CUDA graph support in the autotune decorator

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 15 comments.

File	Description
flash_sparse_attn/ops/triton/utils.py	New utility module providing device detection, architecture identification, autotuning configuration, grid generation, and input validation functions
flash_sparse_attn/ops/triton/flash_fwd.py	Refactored to use new utilities, split forward pass into separate base and varlen functions, and enabled CUDA graph support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

flash_sparse_attn/ops/triton/flash_fwd.py

flash_sparse_attn/ops/triton/utils.py

flash_sparse_attn/ops/triton/flash_fwd.py

flash_sparse_attn/ops/triton/utils.py

flash_sparse_attn/ops/triton/flash_fwd.py

flash_sparse_attn/ops/triton/utils.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…on to use triton utility

…se kernel

LoserCheems added 4 commits February 1, 2026 20:02

Adds utility functions for device management and tensor input validation

f1368b4

Refactors autotuning configuration for forward kernel to use utility …

4d72ace

…function

Initializes output to zero in _fwd_base_kernel for proper handling

ea33f91

Renames _flash_attn_forward to _flash_attn_base_forward and simplifie…

96ae690

…s input handling for improved clarity and consistency

Copilot AI review requested due to automatic review settings February 1, 2026 12:09

Copilot started reviewing on behalf of LoserCheems February 1, 2026 12:09 View session

Copilot AI reviewed Feb 1, 2026

View reviewed changes

LoserCheems and others added 3 commits February 8, 2026 19:46

Update flash_sparse_attn/ops/triton/flash_fwd.py

d2c55cc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Fix typo in output initialization comment and update TILE_K calculati…

2b5e339

…on to use triton utility

Fix autotune key typo and update configuration options for forward ba…

9bdb282

…se kernel

LoserCheems merged commit 9db5fec into main Feb 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add utility functions for device management and input validation#225

Add utility functions for device management and input validation#225
LoserCheems merged 7 commits intomainfrom
optime-triton-kernels

LoserCheems commented Feb 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LoserCheems commented Feb 1, 2026

Summary

Root Cause

Changes

Reproduction

Tests

Compatibility

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants