Skip to content

Conversation

@aleozlx
Copy link

@aleozlx aleozlx commented Dec 5, 2025

For #2845

Added spin_lock_atom_cas_acquire_wait function to handle spin lock acquisition with atomic compare-and-swap.

For NVIDIA#2845

Added spin_lock_atom_cas_acquire_wait function to handle spin lock acquisition with atomic compare-and-swap.
@aleozlx
Copy link
Author

aleozlx commented Dec 5, 2025

This is functional. flashinfer-ai/flashinfer#2171

Raising it as a proposed solution for what we needed when upgrading to nvidia-cutlass-dsl 4.3.1 #2845

Kind regards from FlashInfer & cuDNN :)

@XiaoSong9905
Copy link
Member

XiaoSong9905 commented Dec 12, 2025

acquire wait is not needed. slack Xiao Song and we can schedule a meeting to explain this

@XiaoSong9905
Copy link
Member

the two shot all redue.py fail is related to something else, let's discuss this in the meeting

@shubaoyu2
Copy link
Contributor

shubaoyu2 commented Dec 12, 2025

you can use the new two-shot gemm+ar kernel in cutedsl examples. The one in flashinfer should be an old version.

adding something to CuTeDSL wheel package will take some time, so I would recommend you use the new kernel.

@aleozlx
Copy link
Author

aleozlx commented Dec 16, 2025

sounds good will discuss with you over slack. will learn about the new kernel example and bring action item back to FI

@github-actions
Copy link

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants