Skip to content

Conversation

@JRPan
Copy link
Contributor

@JRPan JRPan commented Jan 21, 2026

Summary

  • Fix critical CUDA synchronization bug in lud_diagonal kernel where __syncthreads() was called inside conditional blocks, causing undefined behavior
  • Change GPU_Microbenchmark build to use mv instead of cp to avoid leaving duplicate binaries in source directories
  • Add clean_GPU_Microbenchmark to the main clean target

Test plan

  • Verified LUD benchmark passes with 64x64 and 256x256 matrices after the fix
  • Build GPU_Microbenchmark and verify binaries are moved correctly

🤖 Generated with Claude Code

- Fix critical CUDA synchronization bug in lud_diagonal kernel where
  __syncthreads() was called inside conditional blocks. This caused
  undefined behavior as not all threads reached the barrier. Moved
  syncthreads outside the if-blocks so all threads participate.

- Change GPU_Microbenchmark build to use mv instead of cp to avoid
  leaving duplicate binaries in source directories.

- Add clean_GPU_Microbenchmark to the main clean target.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@JRPan JRPan merged commit 6a9bc05 into accel-sim:dev Jan 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant