Skip to content

Commit 05ea841

Browse files
Ism Hongchenhengqi
authored andcommitted
runqslower: create sufficiently large array table to avoid missing tasks
We have encountered an issue while running runqslower for long run test, as you can check the log below. The workload of target platform is running auto test script and launching lots of applications periodically on Android TV. There are several abnormal records of run queue latency, ex. 1311763801. We have investgated this issue and found the root cuase of this issue is the hash map to store en-queue start time of each task overflowed, and it seems cpudist also encountered such issue before and fix by following pull request. iovisor#2568 Apply the similar fix to runqslower, getting the max_pid from /proc/sys/kernel/max_pid and pass to BPF C in kernel to create enough entries of BPF map to store en-queue start time of each task. In the mean time, since we create the BPF map with max_pid entries, use BPF_ARRAY instead of BPF_HASH to prevent collision. ``` (bcc)root@localhost:/# runqslower 100000 -P Tracing run queue latency higher than 100000 us TIME COMM TID LAT(us) PREV COMM PREV TID 23:42:45 glide-source-th 28986 161605 kworker/u9:3 12260 23:42:50 HTTPREQUEST_MAN 26249 107187 OMX_VideoInput 7152 23:42:50 migration/2 24 146799 OmxSideband 7147 23:42:50 rtk_post_worker 128 174145 migration/2 24 23:42:50 Binder:233_4 633 170457 runqslower 7086 23:42:58 GIBBON_SURFACE_ 26233 103377 [email protected] 457 23:43:14 Binder:581_1B 2202 113153 Binder:581_7 833 23:43:24 migration/0 15 107716 writer 7277 .... 05:08:14 dumpLogs.sh 22242 1311763801 NBP_POOL-[1] 24233 05:08:14 dumpLogs.sh 22243 1311752373 appsrc7:src 24281 05:08:14 dumpLogs.sh 22248 1311760825 dumpLogs.sh 22193 ``` Signed-off-by: Ism Hong <[email protected]>
1 parent 2057146 commit 05ea841

File tree

1 file changed

+8
-6
lines changed

1 file changed

+8
-6
lines changed

tools/runqslower.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@
7474
#include <linux/nsproxy.h>
7575
#include <linux/pid_namespace.h>
7676
77-
BPF_HASH(start, u32);
77+
BPF_ARRAY(start, u64, MAX_PID);
7878
7979
struct data_t {
8080
u32 pid;
@@ -132,7 +132,7 @@
132132
133133
// fetch timestamp and calculate delta
134134
tsp = start.lookup(&pid);
135-
if (tsp == 0) {
135+
if ((tsp == 0) || (*tsp == 0)) {
136136
return 0; // missed enqueue
137137
}
138138
delta_us = (bpf_ktime_get_ns() - *tsp) / 1000;
@@ -204,7 +204,7 @@
204204
205205
// fetch timestamp and calculate delta
206206
tsp = start.lookup(&pid);
207-
if (tsp == 0) {
207+
if ((tsp == 0) || (*tsp == 0)) {
208208
return 0; // missed enqueue
209209
}
210210
delta_us = (bpf_ktime_get_ns() - *tsp) / 1000;
@@ -262,12 +262,14 @@
262262
def print_event(cpu, data, size):
263263
event = b["events"].event(data)
264264
if args.previous:
265-
print("%-8s %-16s %-6s %14s %-16s %-6s" % (strftime("%H:%M:%S"), event.task, event.pid, event.delta_us, event.prev_task, event.prev_pid))
265+
print("%-8s %-16s %-6s %14s %-16s %-6s" % (strftime("%H:%M:%S"), event.task.decode('utf-8', 'replace'), event.pid, event.delta_us, event.prev_task.decode('utf-8', 'replace'), event.prev_pid))
266266
else:
267-
print("%-8s %-16s %-6s %14s" % (strftime("%H:%M:%S"), event.task, event.pid, event.delta_us))
267+
print("%-8s %-16s %-6s %14s" % (strftime("%H:%M:%S"), event.task.decode('utf-8', 'replace'), event.pid, event.delta_us))
268+
269+
max_pid = int(open("/proc/sys/kernel/pid_max").read())
268270

269271
# load BPF program
270-
b = BPF(text=bpf_text)
272+
b = BPF(text=bpf_text, cflags=["-DMAX_PID=%d" % max_pid])
271273
if not is_support_raw_tp:
272274
b.attach_kprobe(event="ttwu_do_wakeup", fn_name="trace_ttwu_do_wakeup")
273275
b.attach_kprobe(event="wake_up_new_task", fn_name="trace_wake_up_new_task")

0 commit comments

Comments
 (0)