Skip to content

Commit 642e53e

Browse files
committed
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar: "The main changes in this cycle are: - Various NUMA scheduling updates: harmonize the load-balancer and NUMA placement logic to not work against each other. The intended result is better locality, better utilization and fewer migrations. - Introduce Thermal Pressure tracking and optimizations, to improve task placement on thermally overloaded systems. - Implement frequency invariant scheduler accounting on (some) x86 CPUs. This is done by observing and sampling the 'recent' CPU frequency average at ~tick boundaries. The CPU provides this data via the APERF/MPERF MSRs. This hopefully makes our capacity estimates more precise and keeps tasks on the same CPU better even if it might seem overloaded at a lower momentary frequency. (As usual, turbo mode is a complication that we resolve by observing the maximum frequency and renormalizing to it.) - Add asymmetric CPU capacity wakeup scan to improve capacity utilization on asymmetric topologies. (big.LITTLE systems) - PSI fixes and optimizations. - RT scheduling capacity awareness fixes & improvements. - Optimize the CONFIG_RT_GROUP_SCHED constraints code. - Misc fixes, cleanups and optimizations - see the changelog for details" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits) threads: Update PID limit comment according to futex UAPI change sched/fair: Fix condition of avg_load calculation sched/rt: cpupri_find: Trigger a full search as fallback kthread: Do not preempt current task if it is going to call schedule() sched/fair: Improve spreading of utilization sched: Avoid scale real weight down to zero psi: Move PF_MEMSTALL out of task->flags MAINTAINERS: Add maintenance information for psi psi: Optimize switching tasks inside shared cgroups psi: Fix cpu.pressure for cpu.max and competing cgroups sched/core: Distribute tasks within affinity masks sched/fair: Fix enqueue_task_fair warning thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code sched/rt: Remove unnecessary push for unfit tasks sched/rt: Allow pulling unfitting task sched/rt: Optimize cpupri_find() on non-heterogenous systems sched/rt: Re-instate old behavior in select_task_rq_rt() sched/rt: cpupri_find: Implement fallback mechanism for !fit case sched/fair: Fix reordering of enqueue/dequeue_task_fair() sched/fair: Fix runnable_avg for throttled cfs ...
2 parents 9b82f05 + 313f16e commit 642e53e

File tree

37 files changed

+1552
-513
lines changed

37 files changed

+1552
-513
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4428,6 +4428,22 @@
44284428
incurs a small amount of overhead in the scheduler
44294429
but is useful for debugging and performance tuning.
44304430

4431+
sched_thermal_decay_shift=
4432+
[KNL, SMP] Set a decay shift for scheduler thermal
4433+
pressure signal. Thermal pressure signal follows the
4434+
default decay period of other scheduler pelt
4435+
signals(usually 32 ms but configurable). Setting
4436+
sched_thermal_decay_shift will left shift the decay
4437+
period for the thermal pressure signal by the shift
4438+
value.
4439+
i.e. with the default pelt decay period of 32 ms
4440+
sched_thermal_decay_shift thermal pressure decay pr
4441+
1 64 ms
4442+
2 128 ms
4443+
and so on.
4444+
Format: integer between 0 and 10
4445+
Default is 0.
4446+
44314447
skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate
44324448
xtime_lock contention on larger systems, and/or RCU lock
44334449
contention on all systems with CONFIG_MAXSMP set.

Documentation/robust-futex-ABI.txt

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,8 @@ setup that list.
6161
address of the associated 'lock entry', plus or minus, of what will
6262
be called the 'lock word', from that 'lock entry'. The 'lock word'
6363
is always a 32 bit word, unlike the other words above. The 'lock
64-
word' holds 3 flag bits in the upper 3 bits, and the thread id (TID)
65-
of the thread holding the lock in the bottom 29 bits. See further
64+
word' holds 2 flag bits in the upper 2 bits, and the thread id (TID)
65+
of the thread holding the lock in the bottom 30 bits. See further
6666
below for a description of the flag bits.
6767

6868
The third word, called 'list_op_pending', contains transient copy of
@@ -128,7 +128,7 @@ that thread's robust_futex linked lock list a given time.
128128
A given futex lock structure in a user shared memory region may be held
129129
at different times by any of the threads with access to that region. The
130130
thread currently holding such a lock, if any, is marked with the threads
131-
TID in the lower 29 bits of the 'lock word'.
131+
TID in the lower 30 bits of the 'lock word'.
132132

133133
When adding or removing a lock from its list of held locks, in order for
134134
the kernel to correctly handle lock cleanup regardless of when the task
@@ -141,7 +141,7 @@ On insertion:
141141
1) set the 'list_op_pending' word to the address of the 'lock entry'
142142
to be inserted,
143143
2) acquire the futex lock,
144-
3) add the lock entry, with its thread id (TID) in the bottom 29 bits
144+
3) add the lock entry, with its thread id (TID) in the bottom 30 bits
145145
of the 'lock word', to the linked list starting at 'head', and
146146
4) clear the 'list_op_pending' word.
147147

@@ -155,7 +155,7 @@ On removal:
155155

156156
On exit, the kernel will consider the address stored in
157157
'list_op_pending' and the address of each 'lock word' found by walking
158-
the list starting at 'head'. For each such address, if the bottom 29
158+
the list starting at 'head'. For each such address, if the bottom 30
159159
bits of the 'lock word' at offset 'offset' from that address equals the
160160
exiting threads TID, then the kernel will do two things:
161161

@@ -180,7 +180,5 @@ any point:
180180
future kernel configuration changes) elements.
181181

182182
When the kernel sees a list entry whose 'lock word' doesn't have the
183-
current threads TID in the lower 29 bits, it does nothing with that
183+
current threads TID in the lower 30 bits, it does nothing with that
184184
entry, and goes on to the next entry.
185-
186-
Bit 29 (0x20000000) of the 'lock word' is reserved for future use.

MAINTAINERS

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13552,6 +13552,12 @@ F: net/psample
1355213552
F: include/net/psample.h
1355313553
F: include/uapi/linux/psample.h
1355413554

13555+
PRESSURE STALL INFORMATION (PSI)
13556+
M: Johannes Weiner <[email protected]>
13557+
S: Maintained
13558+
F: kernel/sched/psi.c
13559+
F: include/linux/psi*
13560+
1355513561
PSTORE FILESYSTEM
1355613562
M: Kees Cook <[email protected]>
1355713563
M: Anton Vorontsov <[email protected]>

arch/arm/include/asm/topology.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@
1616
/* Enable topology flag updates */
1717
#define arch_update_cpu_topology topology_update_cpu_topology
1818

19+
/* Replace task scheduler's default thermal pressure retrieve API */
20+
#define arch_scale_thermal_pressure topology_get_thermal_pressure
21+
1922
#else
2023

2124
static inline void init_cpu_topology(void) { }

arch/arm64/configs/defconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ CONFIG_ARCH_ZX=y
6262
CONFIG_ARCH_ZYNQMP=y
6363
CONFIG_ARM64_VA_BITS_48=y
6464
CONFIG_SCHED_MC=y
65+
CONFIG_SCHED_SMT=y
6566
CONFIG_NUMA=y
6667
CONFIG_SECCOMP=y
6768
CONFIG_KEXEC=y

arch/arm64/include/asm/topology.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ int pcibus_to_node(struct pci_bus *bus);
2525
/* Enable topology flag updates */
2626
#define arch_update_cpu_topology topology_update_cpu_topology
2727

28+
/* Replace task scheduler's default thermal pressure retrieve API */
29+
#define arch_scale_thermal_pressure topology_get_thermal_pressure
30+
2831
#include <asm-generic/topology.h>
2932

3033
#endif /* _ASM_ARM_TOPOLOGY_H */

arch/x86/include/asm/topology.h

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,4 +193,29 @@ static inline void sched_clear_itmt_support(void)
193193
}
194194
#endif /* CONFIG_SCHED_MC_PRIO */
195195

196+
#ifdef CONFIG_SMP
197+
#include <asm/cpufeature.h>
198+
199+
DECLARE_STATIC_KEY_FALSE(arch_scale_freq_key);
200+
201+
#define arch_scale_freq_invariant() static_branch_likely(&arch_scale_freq_key)
202+
203+
DECLARE_PER_CPU(unsigned long, arch_freq_scale);
204+
205+
static inline long arch_scale_freq_capacity(int cpu)
206+
{
207+
return per_cpu(arch_freq_scale, cpu);
208+
}
209+
#define arch_scale_freq_capacity arch_scale_freq_capacity
210+
211+
extern void arch_scale_freq_tick(void);
212+
#define arch_scale_freq_tick arch_scale_freq_tick
213+
214+
extern void arch_set_max_freq_ratio(bool turbo_disabled);
215+
#else
216+
static inline void arch_set_max_freq_ratio(bool turbo_disabled)
217+
{
218+
}
219+
#endif
220+
196221
#endif /* _ASM_X86_TOPOLOGY_H */

0 commit comments

Comments
 (0)