Skip to content

Update CuTe namespace and enhance dependencies#262

Merged
LoserCheems merged 5 commits intomainfrom
optim_triton_version
Mar 24, 2026
Merged

Update CuTe namespace and enhance dependencies#262
LoserCheems merged 5 commits intomainfrom
optim_triton_version

Conversation

@LoserCheems
Copy link
Collaborator

Summary

  • This update addresses issues related to the namespace of the CuTe library and enhances the Python version requirement and optional dependencies.

Root Cause

  • The previous implementation used an outdated namespace for CuTe, which caused import errors and compatibility issues with newer Python versions.

Changes

  • Refactored the CuTe namespace to flash_sparse_attn.ops.cute.
  • Updated the Python version requirement to 3.10.
  • Enhanced the sync_cute_subtree script for better error handling and temporary worktree support.

Reproduction

  • Ensure the environment meets the new Python version requirement and attempt to import the updated CuTe modules.

Tests

  • Validated changes by running existing tests and confirming that imports function correctly under the new namespace.

Compatibility

  • Migration to the new namespace may require updates to existing code that relies on the old flash_attn.cute imports.

Checklist

  • Linked issue provided
  • Adds or updates tests
  • Updates docs if needed
  • No perf regressions

Copilot AI review requested due to automatic review settings March 24, 2026 05:54
@LoserCheems LoserCheems merged commit 0593fe2 into main Mar 24, 2026
2 checks passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the vendored CuTe/FlashAttention-4 integration to live under the flash_sparse_attn.ops.cute namespace, updates packaging metadata (Python >= 3.10 and new optional deps), and improves the subtree sync scripts by adding namespace-rewrite and a temporary-worktree flow for dirty repos.

Changes:

  • Refactor CuTe Python sources to import via flash_sparse_attn.ops.cute instead of flash_attn.cute.
  • Add a rewrite_cute_namespace.py script and integrate it into the subtree sync scripts (bash + PowerShell), including a temporary worktree mode.
  • Update pyproject.toml to require Python >= 3.10 and add a cute optional-dependency extra.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
scripts/sync_cute_subtree.sh Adds temporary-worktree sync path, CuTe import rewrite step, and improved reporting.
scripts/sync_cute_subtree.ps1 PowerShell equivalent of the enhanced sync workflow with rewrite + temporary worktree support.
scripts/rewrite_cute_namespace.py New helper to rewrite vendored CuTe Python imports to the local namespace.
pyproject.toml Bumps minimum Python version and adds cute + enhanced dev optional dependencies.
flash_sparse_attn/ops/cute/init.py Updates distribution version lookup and patches compile helper import path.
flash_sparse_attn/ops/cute/tile_scheduler.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/softmax.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/paged_kv.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/pack_gqa.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/mask.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/interface.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/flash_fwd.py Rewrites internal imports and updates SM90 lazy import to new namespace.
flash_sparse_attn/ops/cute/flash_fwd_sm90.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/flash_fwd_sm100.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/flash_fwd_sm120.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/flash_fwd_combine.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/flash_bwd.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/flash_bwd_sm90.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/flash_bwd_sm100.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/flash_bwd_sm120.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/flash_bwd_preprocess.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/flash_bwd_postprocess.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/compute_block_sparsity.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/block_sparsity.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/block_sparse_utils.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/block_info.py Rewrites internal imports to the new CuTe namespace.
flash_sparse_attn/ops/cute/blackwell_helpers.py Rewrites internal imports to the new CuTe namespace.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

echo "Done."
echo "Upstream source: $UPSTREAM_REPO"
echo "Upstream cache used for subtree split: $UPSTREAM_REPO_FOR_SPLIT"
echo "Synced commit range: $SYNC_START_HEAD -> $SYNC_END_HEAD"
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reported synced commit range can be inaccurate when syncing via a temporary worktree. In particular, if the rewrite commit is skipped (because there are existing local changes under $PREFIX) then SYNC_END_HEAD is still taken from the temporary worktree’s HEAD, even though that commit wasn’t cherry-picked back. Consider recomputing SYNC_START_HEAD/SYNC_END_HEAD from the current worktree after cherry-picking (or have invoke_temporary_worktree_sync update these vars based on what was actually applied).

Copilot uses AI. Check for mistakes.
if [[ -n "$(git -C "$work_repo_root" status --porcelain -- "$PREFIX")" ]]; then
ensure_git_identity "$work_repo_root"
invoke_git -C "$work_repo_root" add -- "$PREFIX"
invoke_git -C "$work_repo_root" commit -m "Rewrite vendored CuTe namespace to flash_sparse_attn.ops.cute"
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REWRITE_COMMIT_MESSAGE is defined and used to identify the rewrite commit later, but the commit here hard-codes the message string. If the message ever changes, the filter logic can silently break. Use the shared REWRITE_COMMIT_MESSAGE value when creating the commit to keep these paths consistent.

Suggested change
invoke_git -C "$work_repo_root" commit -m "Rewrite vendored CuTe namespace to flash_sparse_attn.ops.cute"
invoke_git -C "$work_repo_root" commit -m "$REWRITE_COMMIT_MESSAGE"

Copilot uses AI. Check for mistakes.
--cache-dir <path> Local cache path used when upstream-repo is a URL.
--skip-fetch Skip git fetch origin in the upstream cache.
--keep-temp-branch Keep the temporary split branch for debugging.
--no-temporary-worktree Fail instead of using a temporary worktree when the current tree is dirty.
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new --no-temporary-worktree option line in the usage output is mis-indented compared to the other options, which makes the help text harder to read. Align the indentation with the rest of the option list.

Suggested change
--no-temporary-worktree Fail instead of using a temporary worktree when the current tree is dirty.
--no-temporary-worktree Fail instead of using a temporary worktree when the current tree is dirty.

Copilot uses AI. Check for mistakes.
Comment on lines 359 to +362
Write-Host "Done."
Write-Host "Upstream source: $UpstreamRepo"
Write-Host "Upstream cache used for subtree split: $upstreamRepoForSplit"
Write-Host "Synced commit range: $($syncResult.StartHead) -> $($syncResult.EndHead)"
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The printed synced commit range can be misleading when using the temporary-worktree path. Invoke-CoreSync returns StartHead/EndHead from the temporary worktree, but the current worktree may not end up at EndHead if you skip cherry-picking the rewrite commit (and instead apply the rewrite without committing). Consider recomputing EndHead from the current repo after cherry-picking (or have Invoke-TemporaryWorktreeSync return the actual applied range).

Suggested change
Write-Host "Done."
Write-Host "Upstream source: $UpstreamRepo"
Write-Host "Upstream cache used for subtree split: $upstreamRepoForSplit"
Write-Host "Synced commit range: $($syncResult.StartHead) -> $($syncResult.EndHead)"
# Recompute the effective end commit from the current repository after syncing,
# since the temporary worktree's EndHead may not match the final HEAD.
$printedEndHead = Get-GitOutput -Repo $repoRoot -Arguments @("rev-parse", "HEAD")
Write-Host "Done."
Write-Host "Upstream source: $UpstreamRepo"
Write-Host "Upstream cache used for subtree split: $upstreamRepoForSplit"
Write-Host "Synced commit range: $($syncResult.StartHead) -> $printedEndHead"

Copilot uses AI. Check for mistakes.
if ($prefixStatus) {
Ensure-GitIdentity -Repo $WorkRepoRoot
Invoke-Git -Repo $WorkRepoRoot -Arguments @("add", "--", $Prefix)
Invoke-Git -Repo $WorkRepoRoot -Arguments @("commit", "-m", "Rewrite vendored CuTe namespace to flash_sparse_attn.ops.cute")
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$RewriteCommitMessage is used to detect the rewrite commit later, but the commit message here is hard-coded. If either string changes, the skip logic in Invoke-TemporaryWorktreeSync can stop matching. Use $RewriteCommitMessage when creating the commit to keep the behavior consistent.

Suggested change
Invoke-Git -Repo $WorkRepoRoot -Arguments @("commit", "-m", "Rewrite vendored CuTe namespace to flash_sparse_attn.ops.cute")
Invoke-Git -Repo $WorkRepoRoot -Arguments @("commit", "-m", $RewriteCommitMessage)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants