Skip to content

Commit 890d550

Browse files
Chengming ZhouPeter Zijlstra
authored andcommitted
sched/psi: report zeroes for CPU full at the system level
Martin find it confusing when look at the /proc/pressure/cpu output, and found no hint about that CPU "full" line in psi Documentation. % cat /proc/pressure/cpu some avg10=0.92 avg60=0.91 avg300=0.73 total=933490489 full avg10=0.22 avg60=0.23 avg300=0.16 total=358783277 The PSI_CPU_FULL state is introduced by commit e7fcd76 ("psi: Add PSI_CPU_FULL state"), which mainly for cgroup level, but also counted at the system level as a side effect. Naturally, the FULL state doesn't exist for the CPU resource at the system level. These "full" numbers can come from CPU idle schedule latency. For example, t1 is the time when task wakeup on an idle CPU, t2 is the time when CPU pick and switch to it. The delta of (t2 - t1) will be in CPU_FULL state. Another case all processes can be stalled is when all cgroups have been throttled at the same time, which unlikely to happen. Anyway, CPU_FULL metric is meaningless and confusing at the system level. So this patch will report zeroes for CPU full at the system level, and update psi Documentation accordingly. Fixes: e7fcd76 ("psi: Add PSI_CPU_FULL state") Reported-by: Martin Steigerwald <[email protected]> Suggested-by: Johannes Weiner <[email protected]> Signed-off-by: Chengming Zhou <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Johannes Weiner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent 0a00a35 commit 890d550

File tree

2 files changed

+13
-11
lines changed

2 files changed

+13
-11
lines changed

Documentation/accounting/psi.rst

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,11 +37,7 @@ Pressure interface
3737
Pressure information for each resource is exported through the
3838
respective file in /proc/pressure/ -- cpu, memory, and io.
3939

40-
The format for CPU is as such::
41-
42-
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
43-
44-
and for memory and IO::
40+
The format is as such::
4541

4642
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
4743
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
@@ -58,6 +54,9 @@ situation from a state where some tasks are stalled but the CPU is
5854
still doing productive work. As such, time spent in this subset of the
5955
stall state is tracked separately and exported in the "full" averages.
6056

57+
CPU full is undefined at the system level, but has been reported
58+
since 5.13, so it is set to zero for backward compatibility.
59+
6160
The ratios (in %) are tracked as recent trends over ten, sixty, and
6261
three hundred second windows, which gives insight into short term events
6362
as well as medium and long term trends. The total absolute stall time

kernel/sched/psi.c

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1060,14 +1060,17 @@ int psi_show(struct seq_file *m, struct psi_group *group, enum psi_res res)
10601060
mutex_unlock(&group->avgs_lock);
10611061

10621062
for (full = 0; full < 2; full++) {
1063-
unsigned long avg[3];
1064-
u64 total;
1063+
unsigned long avg[3] = { 0, };
1064+
u64 total = 0;
10651065
int w;
10661066

1067-
for (w = 0; w < 3; w++)
1068-
avg[w] = group->avg[res * 2 + full][w];
1069-
total = div_u64(group->total[PSI_AVGS][res * 2 + full],
1070-
NSEC_PER_USEC);
1067+
/* CPU FULL is undefined at the system level */
1068+
if (!(group == &psi_system && res == PSI_CPU && full)) {
1069+
for (w = 0; w < 3; w++)
1070+
avg[w] = group->avg[res * 2 + full][w];
1071+
total = div_u64(group->total[PSI_AVGS][res * 2 + full],
1072+
NSEC_PER_USEC);
1073+
}
10711074

10721075
seq_printf(m, "%s avg10=%lu.%02lu avg60=%lu.%02lu avg300=%lu.%02lu total=%llu\n",
10731076
full ? "full" : "some",

0 commit comments

Comments
 (0)