Skip to content

Conversation

@YangKai0616
Copy link

@YangKai0616 YangKai0616 commented Oct 13, 2025

This PR provides a template for a flash-attn Python API. Currently, based on a sycl-tla kernel, I have reproduced part of flash-attn’s functionality, such as fwd and varlen_fwd.

I see that the current implementation of the kernel does not expose an external interface, and the test files are quite limited in scope. If possible, perhaps we can jointly maintain a common API, similar to what Dao-AILab/flash-attention does. Subsequent unit tests can be based on test_flash_attn.py to remain consistent with the CUDA official interface.

During the reproduction I found some edge-case issues when running tests. As a result, I modified xe_flash_attn_prefill_epilogue.hpp and xe_flash_attn_prefill.hpp.

If you are interested, we can discuss this further. If there are any issues with the current code, please let me know. Thanks!

Current method to build the API:

cd /workspace/sycl-tla/examples/06_bmg_flash_attention/flash-attn
CUTLASS_SYCL_SRC_DIR=/workspace/sycl-tla pip install --no-build-isolation .

After that, you can run tests using the same import statements as in test_flash_attn.py.

@yao-matrix
Copy link

yao-matrix commented Oct 15, 2025

@rolandschulz, could you help review? Thx very much. The context is we are integrating flash-attention-2 kernel to Hugging Face, so need align API w/ CUDA to make it easy to integrate, thx,

@Antonyvance Antonyvance added wontfix This will not be worked on redesign required Implementation require a redesign information required The PR requires more information to review them properly labels Oct 17, 2025
@Antonyvance
Copy link

Need to figure out how to place python package, torch extension and associated triton kernel. Let's hold this since it requires a redesign.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

information required The PR requires more information to review them properly redesign required Implementation require a redesign wontfix This will not be worked on

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants