Skip to content

Commit 88817ac

Browse files
committed
Merge tag 'pm-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki: "These revert a recent change in the schedutil cpufreq governor that had not been expected to make any functional difference, but turned out to introduce a performance regression, fix an initialization issue in the amd-pstate driver and make it actually replace the venerable ACPI cpufreq driver on the supported systems by default. Specifics: - Revert a recent schedutil cpufreq governor change that introduced a performace regression on Pixel 6 (Sam Wu) - Fix amd-pstate driver initialization after running the kernel via kexec (Wyes Karny) - Turn amd-pstate into a built-in driver which allows it to take precedence over acpi-cpufreq by default on supported systems and amend it with a mechanism to disable this behavior (Perry Yuan) - Update amd-pstate documentation in accordance with the other changes made to it (Perry Yuan)" * tag 'pm-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Documentation: add amd-pstate kernel command line options Documentation: amd-pstate: add driver working mode introduction cpufreq: amd-pstate: add amd-pstate driver parameter for mode selection cpufreq: amd-pstate: change amd-pstate driver to be built-in type cpufreq: amd-pstate: cpufreq: amd-pstate: reset MSR_AMD_PERF_CTL register at init Revert "cpufreq: schedutil: Move max CPU capacity to sugov_policy"
2 parents e3ebac8 + 1056d31 commit 88817ac

File tree

5 files changed

+74
-48
lines changed

5 files changed

+74
-48
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6959,3 +6959,14 @@
69596959
memory, and other data can't be written using
69606960
xmon commands.
69616961
off xmon is disabled.
6962+
6963+
amd_pstate= [X86]
6964+
disable
6965+
Do not enable amd_pstate as the default
6966+
scaling driver for the supported processors
6967+
passive
6968+
Use amd_pstate as a scaling driver, driver requests a
6969+
desired performance on this abstract scale and the power
6970+
management firmware translates the requests into actual
6971+
hardware states (core frequency, data fabric and memory
6972+
clocks etc.)

Documentation/admin-guide/pm/amd-pstate.rst

Lines changed: 13 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -283,23 +283,19 @@ efficiency frequency management method on AMD processors.
283283
Kernel Module Options for ``amd-pstate``
284284
=========================================
285285

286-
.. _shared_mem:
287-
288-
``shared_mem``
289-
Use a module param (shared_mem) to enable related processors manually with
290-
**amd_pstate.shared_mem=1**.
291-
Due to the performance issue on the processors with `Shared Memory Support
292-
<perf_cap_>`_, we disable it presently and will re-enable this by default
293-
once we address performance issue with this solution.
294-
295-
To check whether the current processor is using `Full MSR Support <perf_cap_>`_
296-
or `Shared Memory Support <perf_cap_>`_ : ::
297-
298-
ray@hr-test1:~$ lscpu | grep cppc
299-
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
300-
301-
If the CPU flags have ``cppc``, then this processor supports `Full MSR Support
302-
<perf_cap_>`_. Otherwise, it supports `Shared Memory Support <perf_cap_>`_.
286+
Passive Mode
287+
------------
288+
289+
``amd_pstate=passive``
290+
291+
It will be enabled if the ``amd_pstate=passive`` is passed to the kernel in the command line.
292+
In this mode, ``amd_pstate`` driver software specifies a desired QoS target in the CPPC
293+
performance scale as a relative number. This can be expressed as percentage of nominal
294+
performance (infrastructure max). Below the nominal sustained performance level,
295+
desired performance expresses the average performance level of the processor subject
296+
to the Performance Reduction Tolerance register. Above the nominal performance level,
297+
processor must provide at least nominal performance requested and go higher if current
298+
operating conditions allow.
303299

304300

305301
``cpupower`` tool support for ``amd-pstate``

drivers/cpufreq/Kconfig.x86

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ config X86_PCC_CPUFREQ
3535
If in doubt, say N.
3636

3737
config X86_AMD_PSTATE
38-
tristate "AMD Processor P-State driver"
38+
bool "AMD Processor P-State driver"
3939
depends on X86 && ACPI
4040
select ACPI_PROCESSOR
4141
select ACPI_CPPC_LIB if X86_64

drivers/cpufreq/amd-pstate.c

Lines changed: 34 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -59,12 +59,8 @@
5959
* we disable it by default to go acpi-cpufreq on these processors and add a
6060
* module parameter to be able to enable it manually for debugging.
6161
*/
62-
static bool shared_mem = false;
63-
module_param(shared_mem, bool, 0444);
64-
MODULE_PARM_DESC(shared_mem,
65-
"enable amd-pstate on processors with shared memory solution (false = disabled (default), true = enabled)");
66-
6762
static struct cpufreq_driver amd_pstate_driver;
63+
static int cppc_load __initdata;
6864

6965
static inline int pstate_enable(bool enable)
7066
{
@@ -424,12 +420,22 @@ static void amd_pstate_boost_init(struct amd_cpudata *cpudata)
424420
amd_pstate_driver.boost_enabled = true;
425421
}
426422

423+
static void amd_perf_ctl_reset(unsigned int cpu)
424+
{
425+
wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0);
426+
}
427+
427428
static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
428429
{
429430
int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
430431
struct device *dev;
431432
struct amd_cpudata *cpudata;
432433

434+
/*
435+
* Resetting PERF_CTL_MSR will put the CPU in P0 frequency,
436+
* which is ideal for initialization process.
437+
*/
438+
amd_perf_ctl_reset(policy->cpu);
433439
dev = get_cpu_device(policy->cpu);
434440
if (!dev)
435441
return -ENODEV;
@@ -616,6 +622,15 @@ static int __init amd_pstate_init(void)
616622

617623
if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
618624
return -ENODEV;
625+
/*
626+
* by default the pstate driver is disabled to load
627+
* enable the amd_pstate passive mode driver explicitly
628+
* with amd_pstate=passive in kernel command line
629+
*/
630+
if (!cppc_load) {
631+
pr_debug("driver load is disabled, boot with amd_pstate=passive to enable this\n");
632+
return -ENODEV;
633+
}
619634

620635
if (!acpi_cpc_valid()) {
621636
pr_warn_once("the _CPC object is not present in SBIOS or ACPI disabled\n");
@@ -630,13 +645,11 @@ static int __init amd_pstate_init(void)
630645
if (boot_cpu_has(X86_FEATURE_CPPC)) {
631646
pr_debug("AMD CPPC MSR based functionality is supported\n");
632647
amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
633-
} else if (shared_mem) {
648+
} else {
649+
pr_debug("AMD CPPC shared memory based functionality is supported\n");
634650
static_call_update(amd_pstate_enable, cppc_enable);
635651
static_call_update(amd_pstate_init_perf, cppc_init_perf);
636652
static_call_update(amd_pstate_update_perf, cppc_update_perf);
637-
} else {
638-
pr_info("This processor supports shared memory solution, you can enable it with amd_pstate.shared_mem=1\n");
639-
return -ENODEV;
640653
}
641654

642655
/* enable amd pstate feature */
@@ -653,16 +666,22 @@ static int __init amd_pstate_init(void)
653666

654667
return ret;
655668
}
669+
device_initcall(amd_pstate_init);
656670

657-
static void __exit amd_pstate_exit(void)
671+
static int __init amd_pstate_param(char *str)
658672
{
659-
cpufreq_unregister_driver(&amd_pstate_driver);
673+
if (!str)
674+
return -EINVAL;
660675

661-
amd_pstate_enable(false);
662-
}
676+
if (!strcmp(str, "disable")) {
677+
cppc_load = 0;
678+
pr_info("driver is explicitly disabled\n");
679+
} else if (!strcmp(str, "passive"))
680+
cppc_load = 1;
663681

664-
module_init(amd_pstate_init);
665-
module_exit(amd_pstate_exit);
682+
return 0;
683+
}
684+
early_param("amd_pstate", amd_pstate_param);
666685

667686
MODULE_AUTHOR("Huang Rui <[email protected]>");
668687
MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver");

kernel/sched/cpufreq_schedutil.c

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,6 @@ struct sugov_policy {
2525
unsigned int next_freq;
2626
unsigned int cached_raw_freq;
2727

28-
/* max CPU capacity, which is equal for all CPUs in freq. domain */
29-
unsigned long max;
30-
3128
/* The next fields are only needed if fast switch cannot be used: */
3229
struct irq_work irq_work;
3330
struct kthread_work work;
@@ -51,6 +48,7 @@ struct sugov_cpu {
5148

5249
unsigned long util;
5350
unsigned long bw_dl;
51+
unsigned long max;
5452

5553
/* The field below is for single-CPU policies only: */
5654
#ifdef CONFIG_NO_HZ_COMMON
@@ -160,6 +158,7 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
160158
{
161159
struct rq *rq = cpu_rq(sg_cpu->cpu);
162160

161+
sg_cpu->max = arch_scale_cpu_capacity(sg_cpu->cpu);
163162
sg_cpu->bw_dl = cpu_bw_dl(rq);
164163
sg_cpu->util = effective_cpu_util(sg_cpu->cpu, cpu_util_cfs(sg_cpu->cpu),
165164
FREQUENCY_UTIL, NULL);
@@ -254,7 +253,6 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
254253
*/
255254
static void sugov_iowait_apply(struct sugov_cpu *sg_cpu, u64 time)
256255
{
257-
struct sugov_policy *sg_policy = sg_cpu->sg_policy;
258256
unsigned long boost;
259257

260258
/* No boost currently required */
@@ -282,8 +280,7 @@ static void sugov_iowait_apply(struct sugov_cpu *sg_cpu, u64 time)
282280
* sg_cpu->util is already in capacity scale; convert iowait_boost
283281
* into the same scale so we can compare.
284282
*/
285-
boost = sg_cpu->iowait_boost * sg_policy->max;
286-
boost >>= SCHED_CAPACITY_SHIFT;
283+
boost = (sg_cpu->iowait_boost * sg_cpu->max) >> SCHED_CAPACITY_SHIFT;
287284
boost = uclamp_rq_util_with(cpu_rq(sg_cpu->cpu), boost, NULL);
288285
if (sg_cpu->util < boost)
289286
sg_cpu->util = boost;
@@ -340,7 +337,7 @@ static void sugov_update_single_freq(struct update_util_data *hook, u64 time,
340337
if (!sugov_update_single_common(sg_cpu, time, flags))
341338
return;
342339

343-
next_f = get_next_freq(sg_policy, sg_cpu->util, sg_policy->max);
340+
next_f = get_next_freq(sg_policy, sg_cpu->util, sg_cpu->max);
344341
/*
345342
* Do not reduce the frequency if the CPU has not been idle
346343
* recently, as the reduction is likely to be premature then.
@@ -376,7 +373,6 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
376373
unsigned int flags)
377374
{
378375
struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
379-
struct sugov_policy *sg_policy = sg_cpu->sg_policy;
380376
unsigned long prev_util = sg_cpu->util;
381377

382378
/*
@@ -403,8 +399,7 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
403399
sg_cpu->util = prev_util;
404400

405401
cpufreq_driver_adjust_perf(sg_cpu->cpu, map_util_perf(sg_cpu->bw_dl),
406-
map_util_perf(sg_cpu->util),
407-
sg_policy->max);
402+
map_util_perf(sg_cpu->util), sg_cpu->max);
408403

409404
sg_cpu->sg_policy->last_freq_update_time = time;
410405
}
@@ -413,19 +408,25 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
413408
{
414409
struct sugov_policy *sg_policy = sg_cpu->sg_policy;
415410
struct cpufreq_policy *policy = sg_policy->policy;
416-
unsigned long util = 0;
411+
unsigned long util = 0, max = 1;
417412
unsigned int j;
418413

419414
for_each_cpu(j, policy->cpus) {
420415
struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
416+
unsigned long j_util, j_max;
421417

422418
sugov_get_util(j_sg_cpu);
423419
sugov_iowait_apply(j_sg_cpu, time);
420+
j_util = j_sg_cpu->util;
421+
j_max = j_sg_cpu->max;
424422

425-
util = max(j_sg_cpu->util, util);
423+
if (j_util * max > j_max * util) {
424+
util = j_util;
425+
max = j_max;
426+
}
426427
}
427428

428-
return get_next_freq(sg_policy, util, sg_policy->max);
429+
return get_next_freq(sg_policy, util, max);
429430
}
430431

431432
static void
@@ -751,15 +752,14 @@ static int sugov_start(struct cpufreq_policy *policy)
751752
{
752753
struct sugov_policy *sg_policy = policy->governor_data;
753754
void (*uu)(struct update_util_data *data, u64 time, unsigned int flags);
754-
unsigned int cpu = cpumask_first(policy->cpus);
755+
unsigned int cpu;
755756

756757
sg_policy->freq_update_delay_ns = sg_policy->tunables->rate_limit_us * NSEC_PER_USEC;
757758
sg_policy->last_freq_update_time = 0;
758759
sg_policy->next_freq = 0;
759760
sg_policy->work_in_progress = false;
760761
sg_policy->limits_changed = false;
761762
sg_policy->cached_raw_freq = 0;
762-
sg_policy->max = arch_scale_cpu_capacity(cpu);
763763

764764
sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
765765

0 commit comments

Comments
 (0)