Skip to content

Commit ba2b4aa

Browse files
authored
Bugfix: Fix data hazard in persistent reduce (#1826)
<!-- .github/pull_request_template.md --> ## πŸ“Œ Description Same as in #1661 ## πŸ” Related Issues <!-- Link any related issues here --> ## πŸš€ Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### βœ… Pre-commit Checks - [ ] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [ ] I have installed the hooks with `pre-commit install`. - [ ] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## πŸ§ͺ Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes <!-- Optional: anything you'd like reviewers to focus on, concerns, etc. -->
1 parent 075775e commit ba2b4aa

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

β€Žinclude/flashinfer/attention/persistent.cuhβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -519,7 +519,7 @@ struct BlockBatchReductionPersistent {
519519
#pragma unroll 1
520520
for (uint32_t i = worker_id; i < num_packed_qo_len * num_kv_heads; i += num_workers) {
521521
PROFILER_EVENT_START(profiler_closure, PersistentProfileEventType::kReduction);
522-
522+
__syncwarp(); // avoid data hazard due to reordering st.cast_store
523523
// remap workload
524524
uint32_t packed_qo_idx = i / num_kv_heads;
525525
uint32_t kv_head_idx = i % num_kv_heads;

0 commit comments

Comments
Β (0)