Skip to content

Commit b4ec805

Browse files
committed
Merge tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki: "These update cpufreq (core and drivers), cpuidle (polling state implementation and the PSCI driver), the OPP (operating performance points) framework, devfreq (core and drivers), the power capping RAPL (Running Average Power Limit) driver, the Energy Model support, the generic power domains (genpd) framework, the ACPI device power management, the core system-wide suspend code and power management utilities. Specifics: - Use local_clock() instead of jiffies in the cpufreq statistics to improve accuracy (Viresh Kumar). - Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq drivers (Viresh Kumar). - Clean up the cpufreq core, the intel_pstate driver and the schedutil cpufreq governor (Rafael Wysocki). - Fix up error code paths in the sti-cpufreq and mediatek cpufreq drivers (Yangtao Li, Qinglang Miao). - Fix cpufreq_online() to return error codes instead of success (0) in all cases when it fails (Wang ShaoBo). - Add mt8167 support to the mediatek cpufreq driver and blacklist mt8516 in the cpufreq-dt-platdev driver (Fabien Parent). - Modify the tegra194 cpufreq driver to always return values from the frequency table as the current frequency and clean up that driver (Sumit Gupta, Jon Hunter). - Modify the arm_scmi cpufreq driver to allow it to discover the power scale present in the performance protocol and provide this information to the Energy Model (Lukasz Luba). - Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali Rohár). - Clean up the CPPC cpufreq driver (Ionela Voinescu). - Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd Bergmann). - Rework the poling interval selection for the polling state in cpuidle (Mel Gorman). - Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle driver (Ulf Hansson). - Modify the OPP framework to support empty (node-less) OPP tables in DT for passing dependency information (Nicola Mazzucato). - Fix potential lockdep issue in the OPP core and clean up the OPP core (Viresh Kumar). - Modify dev_pm_opp_put_regulators() to accept a NULL argument and update its users accordingly (Viresh Kumar). - Add frequency changes tracepoint to devfreq (Matthias Kaehlcke). - Add support for governor feature flags to devfreq, make devfreq sysfs file permissions depend on the governor and clean up the devfreq core (Chanwoo Choi). - Clean up the tegra20 devfreq driver and deprecate it to allow another driver based on EMC_STAT to be used instead of it (Dmitry Osipenko). - Add interconnect support to the tegra30 devfreq driver, allow it to take the interconnect and OPP information from DT and clean it up (Dmitry Osipenko). - Add interconnect support to the exynos-bus devfreq driver along with interconnect properties documentation (Sylwester Nawrocki). - Add suport for AMD Fam17h and Fam19h processors to the RAPL power capping driver (Victor Ding, Kim Phillips). - Fix handling of overly long constraint names in the powercap framework (Lukasz Luba). - Fix the wakeup configuration handling for bridges in the ACPI device power management core (Rafael Wysocki). - Add support for using an abstract scale for power units in the Energy Model (EM) and document it (Lukasz Luba). - Add em_cpu_energy() micro-optimization to the EM (Pavankumar Kondeti). - Modify the generic power domains (genpd) framwework to support suspend-to-idle (Ulf Hansson). - Fix creation of debugfs nodes in genpd (Thierry Strudel). - Clean up genpd (Lina Iyer). - Clean up the core system-wide suspend code and make it print driver flags for devices with debug enabled (Alex Shi, Patrice Chotard, Chen Yu). - Modify the ACPI system reboot code to make it prepare for system power off to avoid confusing the platform firmware (Kai-Heng Feng). - Update the pm-graph (multiple changes, mostly usability-related) and cpupower (online and offline CPU information support) PM utilities (Todd Brandt, Brahadambal Srinivasan)" * tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits) cpufreq: Fix cpufreq_online() return value on errors cpufreq: Fix up several kerneldoc comments cpufreq: stats: Use local_clock() instead of jiffies cpufreq: schedutil: Simplify sugov_update_next_freq() cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate() PM: domains: create debugfs nodes when adding power domains opp: of: Allow empty opp-table with opp-shared dt-bindings: opp: Allow empty OPP tables media: venus: dev_pm_opp_put_*() accepts NULL argument drm/panfrost: dev_pm_opp_put_*() accepts NULL argument drm/lima: dev_pm_opp_put_*() accepts NULL argument PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table opp: Don't create an OPP table from dev_pm_opp_get_opp_table() cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table opp: Reduce the size of critical section in _opp_kref_release() PM / EM: Micro optimization in em_cpu_energy cpufreq: arm_scmi: Discover the power scale in performance protocol ...
2 parents b109bc7 + b3fac81 commit b4ec805

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

80 files changed

+1692
-1210
lines changed

Documentation/ABI/testing/sysfs-class-devfreq

Lines changed: 32 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -37,20 +37,6 @@ Description:
3737
The /sys/class/devfreq/.../target_freq shows the next governor
3838
predicted target frequency of the corresponding devfreq object.
3939

40-
What: /sys/class/devfreq/.../polling_interval
41-
Date: September 2011
42-
Contact: MyungJoo Ham <[email protected]>
43-
Description:
44-
The /sys/class/devfreq/.../polling_interval shows and sets
45-
the requested polling interval of the corresponding devfreq
46-
object. The values are represented in ms. If the value is
47-
less than 1 jiffy, it is considered to be 0, which means
48-
no polling. This value is meaningless if the governor is
49-
not polling; thus. If the governor is not using
50-
devfreq-provided central polling
51-
(/sys/class/devfreq/.../central_polling is 0), this value
52-
may be useless.
53-
5440
What: /sys/class/devfreq/.../trans_stat
5541
Date: October 2012
5642
Contact: MyungJoo Ham <[email protected]>
@@ -66,14 +52,6 @@ Description:
6652

6753
echo 0 > /sys/class/devfreq/.../trans_stat
6854

69-
What: /sys/class/devfreq/.../userspace/set_freq
70-
Date: September 2011
71-
Contact: MyungJoo Ham <[email protected]>
72-
Description:
73-
The /sys/class/devfreq/.../userspace/set_freq shows and
74-
sets the requested frequency for the devfreq object if
75-
userspace governor is in effect.
76-
7755
What: /sys/class/devfreq/.../available_frequencies
7856
Date: October 2012
7957
Contact: Nishanth Menon <[email protected]>
@@ -110,6 +88,35 @@ Description:
11088
The max_freq overrides min_freq because max_freq may be
11189
used to throttle devices to avoid overheating.
11290

91+
What: /sys/class/devfreq/.../polling_interval
92+
Date: September 2011
93+
Contact: MyungJoo Ham <[email protected]>
94+
Description:
95+
The /sys/class/devfreq/.../polling_interval shows and sets
96+
the requested polling interval of the corresponding devfreq
97+
object. The values are represented in ms. If the value is
98+
less than 1 jiffy, it is considered to be 0, which means
99+
no polling. This value is meaningless if the governor is
100+
not polling; thus. If the governor is not using
101+
devfreq-provided central polling
102+
(/sys/class/devfreq/.../central_polling is 0), this value
103+
may be useless.
104+
105+
A list of governors that support the node:
106+
- simple_ondmenad
107+
- tegra_actmon
108+
109+
What: /sys/class/devfreq/.../userspace/set_freq
110+
Date: September 2011
111+
Contact: MyungJoo Ham <[email protected]>
112+
Description:
113+
The /sys/class/devfreq/.../userspace/set_freq shows and
114+
sets the requested frequency for the devfreq object if
115+
userspace governor is in effect.
116+
117+
A list of governors that support the node:
118+
- userspace
119+
113120
What: /sys/class/devfreq/.../timer
114121
Date: July 2020
115122
Contact: Chanwoo Choi <[email protected]>
@@ -122,3 +129,6 @@ Description:
122129

123130
echo deferrable > /sys/class/devfreq/.../timer
124131
echo delayed > /sys/class/devfreq/.../timer
132+
133+
A list of governors that support the node:
134+
- simple_ondemand

Documentation/devicetree/bindings/devfreq/exynos-bus.txt

Lines changed: 69 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,19 @@ Optional properties only for parent bus device:
5151
- exynos,saturation-ratio: the percentage value which is used to calibrate
5252
the performance count against total cycle count.
5353

54+
Optional properties for the interconnect functionality (QoS frequency
55+
constraints):
56+
- #interconnect-cells: should be 0.
57+
- interconnects: as documented in ../interconnect.txt, describes a path at the
58+
higher level interconnects used by this interconnect provider.
59+
If this interconnect provider is directly linked to a top level interconnect
60+
provider the property contains only one phandle. The provider extends
61+
the interconnect graph by linking its node to a node registered by provider
62+
pointed to by first phandle in the 'interconnects' property.
63+
64+
- samsung,data-clock-ratio: ratio of the data throughput in B/s to minimum data
65+
clock frequency in Hz, default value is 8 when this property is missing.
66+
5467
Detailed correlation between sub-blocks and power line according to Exynos SoC:
5568
- In case of Exynos3250, there are two power line as following:
5669
VDD_MIF |--- DMC
@@ -135,7 +148,7 @@ Detailed correlation between sub-blocks and power line according to Exynos SoC:
135148
|--- PERIC (Fixed clock rate)
136149
|--- FSYS (Fixed clock rate)
137150

138-
Example1:
151+
Example 1:
139152
Show the AXI buses of Exynos3250 SoC. Exynos3250 divides the buses to
140153
power line (regulator). The MIF (Memory Interface) AXI bus is used to
141154
transfer data between DRAM and CPU and uses the VDD_MIF regulator.
@@ -184,7 +197,7 @@ Example1:
184197
|L5 |200000 |200000 |400000 |300000 | ||1000000 |
185198
----------------------------------------------------------
186199

187-
Example2 :
200+
Example 2:
188201
The bus of DMC (Dynamic Memory Controller) block in exynos3250.dtsi
189202
is listed below:
190203

@@ -419,3 +432,57 @@ Example2 :
419432
devfreq = <&bus_leftbus>;
420433
status = "okay";
421434
};
435+
436+
Example 3:
437+
An interconnect path "bus_display -- bus_leftbus -- bus_dmc" on
438+
Exynos4412 SoC with video mixer as an interconnect consumer device.
439+
440+
soc {
441+
bus_dmc: bus_dmc {
442+
compatible = "samsung,exynos-bus";
443+
clocks = <&clock CLK_DIV_DMC>;
444+
clock-names = "bus";
445+
operating-points-v2 = <&bus_dmc_opp_table>;
446+
samsung,data-clock-ratio = <4>;
447+
#interconnect-cells = <0>;
448+
};
449+
450+
bus_leftbus: bus_leftbus {
451+
compatible = "samsung,exynos-bus";
452+
clocks = <&clock CLK_DIV_GDL>;
453+
clock-names = "bus";
454+
operating-points-v2 = <&bus_leftbus_opp_table>;
455+
#interconnect-cells = <0>;
456+
interconnects = <&bus_dmc>;
457+
};
458+
459+
bus_display: bus_display {
460+
compatible = "samsung,exynos-bus";
461+
clocks = <&clock CLK_ACLK160>;
462+
clock-names = "bus";
463+
operating-points-v2 = <&bus_display_opp_table>;
464+
#interconnect-cells = <0>;
465+
interconnects = <&bus_leftbus &bus_dmc>;
466+
};
467+
468+
bus_dmc_opp_table: opp_table1 {
469+
compatible = "operating-points-v2";
470+
/* ... */
471+
}
472+
473+
bus_leftbus_opp_table: opp_table3 {
474+
compatible = "operating-points-v2";
475+
/* ... */
476+
};
477+
478+
bus_display_opp_table: opp_table4 {
479+
compatible = "operating-points-v2";
480+
/* .. */
481+
};
482+
483+
&mixer {
484+
compatible = "samsung,exynos4212-mixer";
485+
interconnects = <&bus_display &bus_dmc>;
486+
/* ... */
487+
};
488+
};

Documentation/devicetree/bindings/opp/opp.txt

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,9 @@ Required properties:
6565

6666
- OPP nodes: One or more OPP nodes describing voltage-current-frequency
6767
combinations. Their name isn't significant but their phandle can be used to
68-
reference an OPP.
68+
reference an OPP. These are mandatory except for the case where the OPP table
69+
is present only to indicate dependency between devices using the opp-shared
70+
property.
6971

7072
Optional properties:
7173
- opp-shared: Indicates that device nodes using this OPP Table Node's phandle
@@ -568,3 +570,53 @@ Example 6: opp-microvolt-<name>, opp-microamp-<name>:
568570
};
569571
};
570572
};
573+
574+
Example 7: Single cluster Quad-core ARM cortex A53, OPP points from firmware,
575+
distinct clock controls but two sets of clock/voltage/current lines.
576+
577+
/ {
578+
cpus {
579+
#address-cells = <2>;
580+
#size-cells = <0>;
581+
582+
cpu@0 {
583+
compatible = "arm,cortex-a53";
584+
reg = <0x0 0x100>;
585+
next-level-cache = <&A53_L2>;
586+
clocks = <&dvfs_controller 0>;
587+
operating-points-v2 = <&cpu_opp0_table>;
588+
};
589+
cpu@1 {
590+
compatible = "arm,cortex-a53";
591+
reg = <0x0 0x101>;
592+
next-level-cache = <&A53_L2>;
593+
clocks = <&dvfs_controller 1>;
594+
operating-points-v2 = <&cpu_opp0_table>;
595+
};
596+
cpu@2 {
597+
compatible = "arm,cortex-a53";
598+
reg = <0x0 0x102>;
599+
next-level-cache = <&A53_L2>;
600+
clocks = <&dvfs_controller 2>;
601+
operating-points-v2 = <&cpu_opp1_table>;
602+
};
603+
cpu@3 {
604+
compatible = "arm,cortex-a53";
605+
reg = <0x0 0x103>;
606+
next-level-cache = <&A53_L2>;
607+
clocks = <&dvfs_controller 3>;
608+
operating-points-v2 = <&cpu_opp1_table>;
609+
};
610+
611+
};
612+
613+
cpu_opp0_table: opp0_table {
614+
compatible = "operating-points-v2";
615+
opp-shared;
616+
};
617+
618+
cpu_opp1_table: opp1_table {
619+
compatible = "operating-points-v2";
620+
opp-shared;
621+
};
622+
};

Documentation/driver-api/thermal/power_allocator.rst

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,9 @@ to the speed-grade of the silicon. `sustainable_power` is therefore
7171
simply an estimate, and may be tuned to affect the aggressiveness of
7272
the thermal ramp. For reference, the sustainable power of a 4" phone
7373
is typically 2000mW, while on a 10" tablet is around 4500mW (may vary
74-
depending on screen size).
74+
depending on screen size). It is possible to have the power value
75+
expressed in an abstract scale. The sustained power should be aligned
76+
to the scale used by the related cooling devices.
7577

7678
If you are using device tree, do add it as a property of the
7779
thermal-zone. For example::
@@ -269,3 +271,11 @@ won't be very good. Note that this is not particular to this
269271
governor, step-wise will also misbehave if you call its throttle()
270272
faster than the normal thermal framework tick (due to interrupts for
271273
example) as it will overreact.
274+
275+
Energy Model requirements
276+
=========================
277+
278+
Another important thing is the consistent scale of the power values
279+
provided by the cooling devices. All of the cooling devices in a single
280+
thermal zone should have power values reported either in milli-Watts
281+
or scaled to the same 'abstract scale'.

Documentation/power/energy-model.rst

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,21 @@ possible source of information on its own, the EM framework intervenes as an
2020
abstraction layer which standardizes the format of power cost tables in the
2121
kernel, hence enabling to avoid redundant work.
2222

23+
The power values might be expressed in milli-Watts or in an 'abstract scale'.
24+
Multiple subsystems might use the EM and it is up to the system integrator to
25+
check that the requirements for the power value scale types are met. An example
26+
can be found in the Energy-Aware Scheduler documentation
27+
Documentation/scheduler/sched-energy.rst. For some subsystems like thermal or
28+
powercap power values expressed in an 'abstract scale' might cause issues.
29+
These subsystems are more interested in estimation of power used in the past,
30+
thus the real milli-Watts might be needed. An example of these requirements can
31+
be found in the Intelligent Power Allocation in
32+
Documentation/driver-api/thermal/power_allocator.rst.
33+
Kernel subsystems might implement automatic detection to check whether EM
34+
registered devices have inconsistent scale (based on EM internal flag).
35+
Important thing to keep in mind is that when the power values are expressed in
36+
an 'abstract scale' deriving real energy in milli-Joules would not be possible.
37+
2338
The figure below depicts an example of drivers (Arm-specific here, but the
2439
approach is applicable to any architecture) providing power costs to the EM
2540
framework, and interested clients reading the data from it::
@@ -73,14 +88,18 @@ Drivers are expected to register performance domains into the EM framework by
7388
calling the following API::
7489

7590
int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
76-
struct em_data_callback *cb, cpumask_t *cpus);
91+
struct em_data_callback *cb, cpumask_t *cpus, bool milliwatts);
7792

7893
Drivers must provide a callback function returning <frequency, power> tuples
7994
for each performance state. The callback function provided by the driver is free
8095
to fetch data from any relevant location (DT, firmware, ...), and by any mean
8196
deemed necessary. Only for CPU devices, drivers must specify the CPUs of the
8297
performance domains using cpumask. For other devices than CPUs the last
8398
argument must be set to NULL.
99+
The last argument 'milliwatts' is important to set with correct value. Kernel
100+
subsystems which use EM might rely on this flag to check if all EM devices use
101+
the same scale. If there are different scales, these subsystems might decide
102+
to: return warning/error, stop working or panic.
84103
See Section 3. for an example of driver implementing this
85104
callback, and kernel/power/energy_model.c for further documentation on this
86105
API.
@@ -156,7 +175,8 @@ EM framework::
156175
37 nr_opp = foo_get_nr_opp(policy);
157176
38
158177
39 /* And register the new performance domain */
159-
40 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
160-
41
161-
42 return 0;
162-
43 }
178+
40 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
179+
41 true);
180+
42
181+
43 return 0;
182+
44 }

Documentation/scheduler/sched-energy.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -350,6 +350,11 @@ independent EM framework in Documentation/power/energy-model.rst.
350350
Please also note that the scheduling domains need to be re-built after the
351351
EM has been registered in order to start EAS.
352352

353+
EAS uses the EM to make a forecasting decision on energy usage and thus it is
354+
more focused on the difference when checking possible options for task
355+
placement. For EAS it doesn't matter whether the EM power values are expressed
356+
in milli-Watts or in an 'abstract scale'.
357+
353358

354359
6.3 - Energy Model complexity
355360
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

MAINTAINERS

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11438,7 +11438,6 @@ L: [email protected]
1143811438
1143911439
T: git git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git
1144011440
S: Maintained
11441-
F: drivers/devfreq/tegra20-devfreq.c
1144211441
F: drivers/devfreq/tegra30-devfreq.c
1144311442

1144411443
MEMORY MANAGEMENT

arch/x86/include/asm/msr-index.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -327,8 +327,9 @@
327327
#define MSR_PP1_ENERGY_STATUS 0x00000641
328328
#define MSR_PP1_POLICY 0x00000642
329329

330-
#define MSR_AMD_PKG_ENERGY_STATUS 0xc001029b
331330
#define MSR_AMD_RAPL_POWER_UNIT 0xc0010299
331+
#define MSR_AMD_CORE_ENERGY_STATUS 0xc001029a
332+
#define MSR_AMD_PKG_ENERGY_STATUS 0xc001029b
332333

333334
/* Config TDP MSRs */
334335
#define MSR_CONFIG_TDP_NOMINAL 0x00000648

0 commit comments

Comments
 (0)