Skip to content

Commit 4b029a8

Browse files
spandruvadarafaeljw
authored andcommitted
thermal: int340x: processor_thermal: Add workload type hint interface
Prior to Meteor Lake processor generation, user space can pass workload type request to the firmware. Then firmware can optimize power based on the indicated workload type. User space also uses workload type requests to implement its own heuristics. The firmware in Meteor Lake processor generation is capable of predicting workload type without software help. To avoid duplicate processing, add a sysfs interface allowing user space to obtain the workload hint from the firmware instead of trying to predict the workload type by itself. This workload hint is passed from the firmware via MMIO offset 0x5B18 of the processor thermal PCI device. Before workload hints can be produced by the firmware, it needs to be configured via a mailbox command. This mailbox command turns ON the workload hint and it allows to program a notification delay to control the rate of notifications. The notification delay can be changed from user space vis sysfs. Attribute group 'workload_hint' in sysfs is used for implementing the workload hints interface between user space and the kernel. It contains the following attributes: workload_type_enable: Enables/disables workload type hints from the firmware. notification_delay_ms: Notification delay in milliseconds. workload_type_index: The current workload type index predicted by the firmware (see the documentation changes below for supported index values and their meaning). Signed-off-by: Srinivas Pandruvada <[email protected]> [ rjw: Changelog edits, documentation edits, whitespace adjustments ] Signed-off-by: Rafael J. Wysocki <[email protected]>
1 parent 2f0b31c commit 4b029a8

File tree

6 files changed

+331
-1
lines changed

6 files changed

+331
-1
lines changed

Documentation/driver-api/thermal/intel_dptf.rst

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -315,3 +315,57 @@ DPTF Fan Control
315315
----------------------------------------
316316

317317
Refer to Documentation/admin-guide/acpi/fan_performance_states.rst
318+
319+
Workload Type Hints
320+
----------------------------------------
321+
322+
The firmware in Meteor Lake processor generation is capable of identifying
323+
workload type and passing hints regarding it to the OS. A special sysfs
324+
interface is provided to allow user space to obtain workload type hints from
325+
the firmware and control the rate at which they are provided.
326+
327+
User space can poll attribute "workload_type_index" for the current hint or
328+
can receive a notification whenever the value of this attribute is updated.
329+
330+
file:`/sys/bus/pci/devices/0000:00:04.0/workload_hint/`
331+
Segment 0, bus 0, device 4, function 0 is reserved for the processor thermal
332+
device on all Intel client processors. So, the above path doesn't change
333+
based on the processor generation.
334+
335+
``workload_hint_enable`` (RW)
336+
Enable firmware to send workload type hints to user space.
337+
338+
``notification_delay_ms`` (RW)
339+
Minimum delay in milliseconds before firmware will notify OS. This is
340+
for the rate control of notifications. This delay is between changing
341+
the workload type prediction in the firmware and notifying the OS about
342+
the change. The default delay is 1024 ms. The delay of 0 is invalid.
343+
The delay is rounded up to the nearest power of 2 to simplify firmware
344+
programming of the delay value. The read of notification_delay_ms
345+
attribute shows the effective value used.
346+
347+
``workload_type_index`` (RO)
348+
Predicted workload type index. User space can get notification of
349+
change via existing sysfs attribute change notification mechanism.
350+
351+
The supported index values and their meaning for the Meteor Lake
352+
processor generation are as follows:
353+
354+
0 - Idle: System performs no tasks, power and idle residency are
355+
consistently low for long periods of time.
356+
357+
1 – Battery Life: Power is relatively low, but the processor may
358+
still be actively performing a task, such as video playback for
359+
a long period of time.
360+
361+
2 – Sustained: Power level that is relatively high for a long period
362+
of time, with very few to no periods of idleness, which will
363+
eventually exhaust RAPL Power Limit 1 and 2.
364+
365+
3 – Bursty: Consumes a relatively constant average amount of power, but
366+
periods of relative idleness are interrupted by bursts of
367+
activity. The bursts are relatively short and the periods of
368+
relative idleness between them typically prevent RAPL Power
369+
Limit 1 from being exhausted.
370+
371+
4 – Unknown: Can't classify.

drivers/thermal/intel/int340x_thermal/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,6 @@ obj-$(CONFIG_PROC_THERMAL_MMIO_RAPL) += processor_thermal_rapl.o
1111
obj-$(CONFIG_INT340X_THERMAL) += processor_thermal_rfim.o
1212
obj-$(CONFIG_INT340X_THERMAL) += processor_thermal_mbox.o
1313
obj-$(CONFIG_INT340X_THERMAL) += processor_thermal_wt_req.o
14+
obj-$(CONFIG_INT340X_THERMAL) += processor_thermal_wt_hint.o
1415
obj-$(CONFIG_INT3406_THERMAL) += int3406_thermal.o
1516
obj-$(CONFIG_ACPI_THERMAL_REL) += acpi_thermal_rel.o

drivers/thermal/intel/int340x_thermal/processor_thermal_device.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -352,6 +352,12 @@ int proc_thermal_mmio_add(struct pci_dev *pdev,
352352
dev_err(&pdev->dev, "failed to add MBOX interface\n");
353353
goto err_rem_rfim;
354354
}
355+
} else if (feature_mask & PROC_THERMAL_FEATURE_WT_HINT) {
356+
ret = proc_thermal_wt_hint_add(pdev, proc_priv);
357+
if (ret) {
358+
dev_err(&pdev->dev, "failed to add WT Hint\n");
359+
goto err_rem_rfim;
360+
}
355361
}
356362

357363
return 0;
@@ -376,10 +382,13 @@ void proc_thermal_mmio_remove(struct pci_dev *pdev, struct proc_thermal_device *
376382

377383
if (proc_priv->mmio_feature_mask & PROC_THERMAL_FEATURE_WT_REQ)
378384
proc_thermal_wt_req_remove(pdev);
385+
else if (proc_priv->mmio_feature_mask & PROC_THERMAL_FEATURE_WT_HINT)
386+
proc_thermal_wt_hint_remove(pdev);
379387
}
380388
EXPORT_SYMBOL_GPL(proc_thermal_mmio_remove);
381389

382390
MODULE_IMPORT_NS(INTEL_TCC);
391+
MODULE_IMPORT_NS(INT340X_THERMAL);
383392
MODULE_AUTHOR("Srinivas Pandruvada <[email protected]>");
384393
MODULE_DESCRIPTION("Processor Thermal Reporting Device Driver");
385394
MODULE_LICENSE("GPL v2");

drivers/thermal/intel/int340x_thermal/processor_thermal_device.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ struct rapl_mmio_regs {
6161
#define PROC_THERMAL_FEATURE_DVFS 0x04
6262
#define PROC_THERMAL_FEATURE_WT_REQ 0x08
6363
#define PROC_THERMAL_FEATURE_DLVR 0x10
64+
#define PROC_THERMAL_FEATURE_WT_HINT 0x20
6465

6566
#if IS_ENABLED(CONFIG_PROC_THERMAL_MMIO_RAPL)
6667
int proc_thermal_rapl_add(struct pci_dev *pdev, struct proc_thermal_device *proc_priv);
@@ -95,6 +96,12 @@ int processor_thermal_mbox_interrupt_config(struct pci_dev *pdev, bool enable, i
9596
int time_window);
9697
int proc_thermal_add(struct device *dev, struct proc_thermal_device *priv);
9798
void proc_thermal_remove(struct proc_thermal_device *proc_priv);
99+
100+
int proc_thermal_wt_hint_add(struct pci_dev *pdev, struct proc_thermal_device *proc_priv);
101+
void proc_thermal_wt_hint_remove(struct pci_dev *pdev);
102+
void proc_thermal_wt_intr_callback(struct pci_dev *pdev, struct proc_thermal_device *proc_priv);
103+
bool proc_thermal_check_wt_intr(struct proc_thermal_device *proc_priv);
104+
98105
int proc_thermal_suspend(struct device *dev);
99106
int proc_thermal_resume(struct device *dev);
100107
int proc_thermal_mmio_add(struct pci_dev *pdev,

drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -365,7 +365,8 @@ static const struct pci_device_id proc_thermal_pci_ids[] = {
365365
{ PCI_DEVICE_DATA(INTEL, ADL_THERMAL, PROC_THERMAL_FEATURE_RAPL |
366366
PROC_THERMAL_FEATURE_FIVR | PROC_THERMAL_FEATURE_DVFS | PROC_THERMAL_FEATURE_WT_REQ) },
367367
{ PCI_DEVICE_DATA(INTEL, MTLP_THERMAL, PROC_THERMAL_FEATURE_RAPL |
368-
PROC_THERMAL_FEATURE_FIVR | PROC_THERMAL_FEATURE_DVFS | PROC_THERMAL_FEATURE_DLVR) },
368+
PROC_THERMAL_FEATURE_FIVR | PROC_THERMAL_FEATURE_DVFS | PROC_THERMAL_FEATURE_DLVR |
369+
PROC_THERMAL_FEATURE_WT_HINT) },
369370
{ PCI_DEVICE_DATA(INTEL, RPL_THERMAL, PROC_THERMAL_FEATURE_RAPL |
370371
PROC_THERMAL_FEATURE_FIVR | PROC_THERMAL_FEATURE_DVFS | PROC_THERMAL_FEATURE_WT_REQ) },
371372
{ },
Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
// SPDX-License-Identifier: GPL-2.0-only
2+
/*
3+
* processor thermal device interface for reading workload type hints
4+
* from the user space. The hints are provided by the firmware.
5+
*
6+
* Operation:
7+
* When user space enables workload type prediction:
8+
* - Use mailbox to:
9+
* Configure notification delay
10+
* Enable processor thermal device interrupt
11+
*
12+
* - The predicted workload type can be read from MMIO:
13+
* Offset 0x5B18 shows if there was an interrupt
14+
* active for change in workload type and also
15+
* predicted workload type.
16+
*
17+
* Two interface functions are provided to call when there is a
18+
* thermal device interrupt:
19+
* - proc_thermal_check_wt_intr():
20+
* Check if the interrupt is for change in workload type. Called from
21+
* interrupt context.
22+
*
23+
* - proc_thermal_wt_intr_callback():
24+
* Callback for interrupt processing in thread context. This involves
25+
* sending notification to user space that there is a change in the
26+
* workload type.
27+
*
28+
* Copyright (c) 2023, Intel Corporation.
29+
*/
30+
31+
#include <linux/bitfield.h>
32+
#include <linux/pci.h>
33+
#include "processor_thermal_device.h"
34+
35+
#define SOC_WT_RES_INT_STATUS_OFFSET 0x5B18
36+
#define SOC_WT GENMASK_ULL(47, 40)
37+
38+
#define SOC_WT_PREDICTION_INT_ENABLE_BIT 23
39+
40+
#define SOC_WT_PREDICTION_INT_ACTIVE BIT(2)
41+
42+
/*
43+
* Closest possible to 1 Second is 1024 ms with programmed time delay
44+
* of 0x0A.
45+
*/
46+
static u8 notify_delay = 0x0A;
47+
static u16 notify_delay_ms = 1024;
48+
49+
static DEFINE_MUTEX(wt_lock);
50+
static u8 wt_enable;
51+
52+
/* Show current predicted workload type index */
53+
static ssize_t workload_type_index_show(struct device *dev,
54+
struct device_attribute *attr,
55+
char *buf)
56+
{
57+
struct proc_thermal_device *proc_priv;
58+
struct pci_dev *pdev = to_pci_dev(dev);
59+
u64 status = 0;
60+
int wt;
61+
62+
mutex_lock(&wt_lock);
63+
if (!wt_enable) {
64+
mutex_unlock(&wt_lock);
65+
return -ENODATA;
66+
}
67+
68+
proc_priv = pci_get_drvdata(pdev);
69+
70+
status = readq(proc_priv->mmio_base + SOC_WT_RES_INT_STATUS_OFFSET);
71+
72+
mutex_unlock(&wt_lock);
73+
74+
wt = FIELD_GET(SOC_WT, status);
75+
76+
return sysfs_emit(buf, "%d\n", wt);
77+
}
78+
79+
static DEVICE_ATTR_RO(workload_type_index);
80+
81+
static ssize_t workload_hint_enable_show(struct device *dev,
82+
struct device_attribute *attr,
83+
char *buf)
84+
{
85+
return sysfs_emit(buf, "%d\n", wt_enable);
86+
}
87+
88+
static ssize_t workload_hint_enable_store(struct device *dev,
89+
struct device_attribute *attr,
90+
const char *buf, size_t size)
91+
{
92+
struct pci_dev *pdev = to_pci_dev(dev);
93+
u8 mode;
94+
int ret;
95+
96+
if (kstrtou8(buf, 10, &mode) || mode > 1)
97+
return -EINVAL;
98+
99+
mutex_lock(&wt_lock);
100+
101+
if (mode)
102+
ret = processor_thermal_mbox_interrupt_config(pdev, true,
103+
SOC_WT_PREDICTION_INT_ENABLE_BIT,
104+
notify_delay);
105+
else
106+
ret = processor_thermal_mbox_interrupt_config(pdev, false,
107+
SOC_WT_PREDICTION_INT_ENABLE_BIT, 0);
108+
109+
if (ret)
110+
goto ret_enable_store;
111+
112+
ret = size;
113+
wt_enable = mode;
114+
115+
ret_enable_store:
116+
mutex_unlock(&wt_lock);
117+
118+
return ret;
119+
}
120+
121+
static DEVICE_ATTR_RW(workload_hint_enable);
122+
123+
static ssize_t notification_delay_ms_show(struct device *dev,
124+
struct device_attribute *attr,
125+
char *buf)
126+
{
127+
return sysfs_emit(buf, "%u\n", notify_delay_ms);
128+
}
129+
130+
static ssize_t notification_delay_ms_store(struct device *dev,
131+
struct device_attribute *attr,
132+
const char *buf, size_t size)
133+
{
134+
struct pci_dev *pdev = to_pci_dev(dev);
135+
u16 new_tw;
136+
int ret;
137+
u8 tm;
138+
139+
/*
140+
* Time window register value:
141+
* Formula: (1 + x/4) * power(2,y)
142+
* x = 2 msbs, that is [30:29] y = 5 [28:24]
143+
* in INTR_CONFIG register.
144+
* The result will be in milli seconds.
145+
* Here, just keep x = 0, and just change y.
146+
* First round up the user value to power of 2 and
147+
* then take log2, to get "y" value to program.
148+
*/
149+
ret = kstrtou16(buf, 10, &new_tw);
150+
if (ret)
151+
return ret;
152+
153+
if (!new_tw)
154+
return -EINVAL;
155+
156+
new_tw = roundup_pow_of_two(new_tw);
157+
tm = ilog2(new_tw);
158+
if (tm > 31)
159+
return -EINVAL;
160+
161+
mutex_lock(&wt_lock);
162+
163+
/* If the workload hint was already enabled, then update with the new delay */
164+
if (wt_enable)
165+
ret = processor_thermal_mbox_interrupt_config(pdev, true,
166+
SOC_WT_PREDICTION_INT_ENABLE_BIT,
167+
tm);
168+
169+
if (!ret) {
170+
ret = size;
171+
notify_delay = tm;
172+
notify_delay_ms = new_tw;
173+
}
174+
175+
mutex_unlock(&wt_lock);
176+
177+
return ret;
178+
}
179+
180+
static DEVICE_ATTR_RW(notification_delay_ms);
181+
182+
static struct attribute *workload_hint_attrs[] = {
183+
&dev_attr_workload_type_index.attr,
184+
&dev_attr_workload_hint_enable.attr,
185+
&dev_attr_notification_delay_ms.attr,
186+
NULL
187+
};
188+
189+
static const struct attribute_group workload_hint_attribute_group = {
190+
.attrs = workload_hint_attrs,
191+
.name = "workload_hint"
192+
};
193+
194+
/*
195+
* Callback to check if the interrupt for prediction is active.
196+
* Caution: Called from the interrupt context.
197+
*/
198+
bool proc_thermal_check_wt_intr(struct proc_thermal_device *proc_priv)
199+
{
200+
u64 int_status;
201+
202+
int_status = readq(proc_priv->mmio_base + SOC_WT_RES_INT_STATUS_OFFSET);
203+
if (int_status & SOC_WT_PREDICTION_INT_ACTIVE)
204+
return true;
205+
206+
return false;
207+
}
208+
EXPORT_SYMBOL_NS_GPL(proc_thermal_check_wt_intr, INT340X_THERMAL);
209+
210+
/* Callback to notify user space */
211+
void proc_thermal_wt_intr_callback(struct pci_dev *pdev, struct proc_thermal_device *proc_priv)
212+
{
213+
u64 status;
214+
215+
status = readq(proc_priv->mmio_base + SOC_WT_RES_INT_STATUS_OFFSET);
216+
if (!(status & SOC_WT_PREDICTION_INT_ACTIVE))
217+
return;
218+
219+
writeq(status & ~SOC_WT_PREDICTION_INT_ACTIVE,
220+
proc_priv->mmio_base + SOC_WT_RES_INT_STATUS_OFFSET);
221+
sysfs_notify(&pdev->dev.kobj, "workload_hint", "workload_type_index");
222+
}
223+
EXPORT_SYMBOL_NS_GPL(proc_thermal_wt_intr_callback, INT340X_THERMAL);
224+
225+
static bool workload_hint_created;
226+
227+
int proc_thermal_wt_hint_add(struct pci_dev *pdev, struct proc_thermal_device *proc_priv)
228+
{
229+
int ret;
230+
231+
ret = sysfs_create_group(&pdev->dev.kobj, &workload_hint_attribute_group);
232+
if (ret)
233+
return ret;
234+
235+
workload_hint_created = true;
236+
237+
return 0;
238+
}
239+
EXPORT_SYMBOL_NS_GPL(proc_thermal_wt_hint_add, INT340X_THERMAL);
240+
241+
void proc_thermal_wt_hint_remove(struct pci_dev *pdev)
242+
{
243+
mutex_lock(&wt_lock);
244+
if (wt_enable)
245+
processor_thermal_mbox_interrupt_config(pdev, false,
246+
SOC_WT_PREDICTION_INT_ENABLE_BIT,
247+
0);
248+
mutex_unlock(&wt_lock);
249+
250+
if (workload_hint_created)
251+
sysfs_remove_group(&pdev->dev.kobj, &workload_hint_attribute_group);
252+
253+
workload_hint_created = false;
254+
}
255+
EXPORT_SYMBOL_NS_GPL(proc_thermal_wt_hint_remove, INT340X_THERMAL);
256+
257+
MODULE_IMPORT_NS(INT340X_THERMAL);
258+
MODULE_LICENSE("GPL");

0 commit comments

Comments
 (0)