Skip to content

Commit 9c1c6ff

Browse files
committed
fix: restore set_recent_kernel(2) in rate_limiter() to match original behavior
Restore the unconditional set_recent_kernel(2) call that was removed in the rate_limiter optimization. The write has negligible cost (~100- 200ns cache line store) compared to the other savings in this function, and removing it changes observable shared memory state which could affect external tooling or future features. The call is placed after the cached sm_limit/util_switch fast-exit, matching the original position relative to the get_recent_kernel() guard. All other optimizations (cached limits, removed duplicate sm_limit call, reduced sleep) are preserved. Signed-off-by nishitnshah <nishshah@linkedin.com>
1 parent 0a28f79 commit 9c1c6ff

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

src/multiprocess/multiprocess_utilization_watcher.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ void rate_limiter(int grids, int blocks) {
5050
while (get_recent_kernel() < 0) {
5151
usleep(1000);
5252
}
53+
set_recent_kernel(2);
5354

5455
LOG_DEBUG("grid: %d, blocks: %d", grids, blocks);
5556
LOG_DEBUG("launch kernel %ld, curr core: %ld", kernel_size, g_cur_cuda_cores);

0 commit comments

Comments
 (0)