Skip to content

Commit d05666f

Browse files
nishitnshahmaverick123123
authored andcommitted
fix: restore set_recent_kernel(2) in rate_limiter() to match original behavior
Restore the unconditional set_recent_kernel(2) call that was removed in the rate_limiter optimization. The write has negligible cost (~100- 200ns cache line store) compared to the other savings in this function, and removing it changes observable shared memory state which could affect external tooling or future features. The call is placed after the cached sm_limit/util_switch fast-exit, matching the original position relative to the get_recent_kernel() guard. All other optimizations (cached limits, removed duplicate sm_limit call, reduced sleep) are preserved. Signed-off-by nishitnshah <nishshah@linkedin.com> Signed-off-by: Maverick123123 <yuming.wu@dynamia.ai>
1 parent fd3ace7 commit d05666f

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

src/multiprocess/multiprocess_utilization_watcher.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ void rate_limiter(int grids, int blocks) {
5454
while (get_recent_kernel() < 0) {
5555
usleep(1000);
5656
}
57+
set_recent_kernel(2);
5758

5859
LOG_DEBUG("grid: %d, blocks: %d", grids, blocks);
5960
LOG_DEBUG("launch kernel %ld, curr core: %ld", kernel_size, g_cur_cuda_cores);

0 commit comments

Comments
 (0)