@@ -329,6 +329,106 @@ information listed above is the same for all of the processors supporting the
329329HWP feature, which is why ``intel_pstate `` works with all of them.]
330330
331331
332+ Support for Hybrid Processors
333+ =============================
334+
335+ Some processors supported by ``intel_pstate `` contain two or more types of CPU
336+ cores differing by the maximum turbo P-state, performance vs power characteristics,
337+ cache sizes, and possibly other properties. They are commonly referred to as
338+ hybrid processors. To support them, ``intel_pstate `` requires HWP to be enabled
339+ and it assumes the HWP performance units to be the same for all CPUs in the
340+ system, so a given HWP performance level always represents approximately the
341+ same physical performance regardless of the core (CPU) type.
342+
343+ Hybrid Processors with SMT
344+ --------------------------
345+
346+ On systems where SMT (Simultaneous Multithreading), also referred to as
347+ HyperThreading (HT) in the context of Intel processors, is enabled on at least
348+ one core, ``intel_pstate `` assigns performance-based priorities to CPUs. Namely,
349+ the priority of a given CPU reflects its highest HWP performance level which
350+ causes the CPU scheduler to generally prefer more performant CPUs, so the less
351+ performant CPUs are used when the other ones are fully loaded. However, SMT
352+ siblings (that is, logical CPUs sharing one physical core) are treated in a
353+ special way such that if one of them is in use, the effective priority of the
354+ other ones is lowered below the priorities of the CPUs located in the other
355+ physical cores.
356+
357+ This approach maximizes performance in the majority of cases, but unfortunately
358+ it also leads to excessive energy usage in some important scenarios, like video
359+ playback, which is not generally desirable. While there is no other viable
360+ choice with SMT enabled because the effective capacity and utilization of SMT
361+ siblings are hard to determine, hybrid processors without SMT can be handled in
362+ more energy-efficient ways.
363+
364+ .. _CAS :
365+
366+ Capacity-Aware Scheduling Support
367+ ---------------------------------
368+
369+ The capacity-aware scheduling (CAS) support in the CPU scheduler is enabled by
370+ ``intel_pstate `` by default on hybrid processors without SMT. CAS generally
371+ causes the scheduler to put tasks on a CPU so long as there is a sufficient
372+ amount of spare capacity on it, and if the utilization of a given task is too
373+ high for it, the task will need to go somewhere else.
374+
375+ Since CAS takes CPU capacities into account, it does not require CPU
376+ prioritization and it allows tasks to be distributed more symmetrically among
377+ the more performant and less performant CPUs. Once placed on a CPU with enough
378+ capacity to accommodate it, a task may just continue to run there regardless of
379+ whether or not the other CPUs are fully loaded, so on average CAS reduces the
380+ utilization of the more performant CPUs which causes the energy usage to be more
381+ balanced because the more performant CPUs are generally less energy-efficient
382+ than the less performant ones.
383+
384+ In order to use CAS, the scheduler needs to know the capacity of each CPU in
385+ the system and it needs to be able to compute scale-invariant utilization of
386+ CPUs, so ``intel_pstate `` provides it with the requisite information.
387+
388+ First of all, the capacity of each CPU is represented by the ratio of its highest
389+ HWP performance level, multiplied by 1024, to the highest HWP performance level
390+ of the most performant CPU in the system, which works because the HWP performance
391+ units are the same for all CPUs. Second, the frequency-invariance computations,
392+ carried out by the scheduler to always express CPU utilization in the same units
393+ regardless of the frequency it is currently running at, are adjusted to take the
394+ CPU capacity into account. All of this happens when ``intel_pstate `` has
395+ registered itself with the ``CPUFreq `` core and it has figured out that it is
396+ running on a hybrid processor without SMT.
397+
398+ Energy-Aware Scheduling Support
399+ -------------------------------
400+
401+ If ``CONFIG_ENERGY_MODEL `` has been set during kernel configuration and
402+ ``intel_pstate `` runs on a hybrid processor without SMT, in addition to enabling
403+ `CAS <CAS _>`_ it registers an Energy Model for the processor. This allows the
404+ Energy-Aware Scheduling (EAS) support to be enabled in the CPU scheduler if
405+ ``schedutil `` is used as the ``CPUFreq `` governor which requires ``intel_pstate ``
406+ to operate in the `passive mode <Passive Mode _>`_.
407+
408+ The Energy Model registered by ``intel_pstate `` is artificial (that is, it is
409+ based on abstract cost values and it does not include any real power numbers)
410+ and it is relatively simple to avoid unnecessary computations in the scheduler.
411+ There is a performance domain in it for every CPU in the system and the cost
412+ values for these performance domains have been chosen so that running a task on
413+ a less performant (small) CPU appears to be always cheaper than running that
414+ task on a more performant (big) CPU. However, for two CPUs of the same type,
415+ the cost difference depends on their current utilization, and the CPU whose
416+ current utilization is higher generally appears to be a more expensive
417+ destination for a given task. This helps to balance the load among CPUs of the
418+ same type.
419+
420+ Since EAS works on top of CAS, high-utilization tasks are always migrated to
421+ CPUs with enough capacity to accommodate them, but thanks to EAS, low-utilization
422+ tasks tend to be placed on the CPUs that look less expensive to the scheduler.
423+ Effectively, this causes the less performant and less loaded CPUs to be
424+ preferred as long as they have enough spare capacity to run the given task
425+ which generally leads to reduced energy usage.
426+
427+ The Energy Model created by ``intel_pstate `` can be inspected by looking at
428+ the ``energy_model `` directory in ``debugfs `` (typlically mounted on
429+ ``/sys/kernel/debug/ ``).
430+
431+
332432User Space Interface in ``sysfs ``
333433=================================
334434
@@ -697,8 +797,8 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
697797 Limits `_ for details).
698798
699799``no_cas ``
700- Do not enable capacity-aware scheduling (CAS) which is enabled by
701- default on hybrid systems.
800+ Do not enable ` capacity-aware scheduling < CAS _>`_ which is enabled by
801+ default on hybrid systems without SMT .
702802
703803Diagnostics and Tuning
704804======================
0 commit comments