Skip to content

Commit d84837f

Browse files
dashingwuXiaoguang Wu
andauthored
[Runtime] fix a scheduling issue (#970)
The original code assumes the last 4 bits of the CPU cycle count is uniformly distributed, but that is not true, at lease Intel IceLake Intel(R) Xeon(R) Platinum 8369B CPU @ 2.70GHz, the CPU cycle is always ODD number. This fact will result expensive ops are frequently scheduled to signle thread, which will greatly increase the RT time (in custom scenario, from ~30ms to ~45ms). Signed-off-by: Xiaoguang Wu <[email protected]> Co-authored-by: Xiaoguang Wu <[email protected]>
1 parent 5eabe5f commit d84837f

File tree

1 file changed

+5
-4
lines changed

1 file changed

+5
-4
lines changed

tensorflow/core/common_runtime/executor.cc

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -730,15 +730,16 @@ Status ExecutorState<PropagatorStateType>::ProcessSync(
730730

731731
} else if (kernel_stats_->HasExpensiveMarker(item)) {
732732
KernelTimer timer;
733+
static uint64 update_counter = 0;
733734
device->Compute(op_kernel, &ctx);
734-
// For expensive kernels, always update the cost estimate. For inexpensive
735-
// kernels, update the cost estimate with ~1/16 probability. This assumes
736-
// that the last 4 bits of the CPU cycle count is uniformly distributed.
735+
737736
constexpr int kKernelExecutionTrackingInvocationSkipCount = 16;
738737
if (is_expensive ||
739-
timer.start_cycles % kKernelExecutionTrackingInvocationSkipCount == 0) {
738+
update_counter % kKernelExecutionTrackingInvocationSkipCount == 0) {
740739
kernel_stats_->UpdateCostEstimate(item, timer.ElapsedCycles());
741740
}
741+
742+
update_counter++;
742743
} else {
743744
device->Compute(op_kernel, &ctx);
744745
}

0 commit comments

Comments
 (0)