Skip to content

Conversation

@beomki-yeo
Copy link
Contributor

Another PR assisted by AI. The change is nothing drastic but clearing up the codes and reducing the number of atomic operations.

Change summary:

  • Do local accumulation per lane; emit at most one atomicAdd if non-zero.
  • Merge offset search + left-count into one pass; cache lane/group/stride etc.
  • Make the valid-slot binary search leader-only to cut redundant work.
  • Fix int delta = delta = ... and hoist reference loads into const locals.

@beomki-yeo beomki-yeo added improvement Improve an existing feature AI assistance Assisted by AI labels Aug 30, 2025
@sonarqubecloud
Copy link

@beomki-yeo beomki-yeo marked this pull request as draft September 22, 2025 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI assistance Assisted by AI improvement Improve an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant