An (potentially) easy-to-understand explanation on different modes in intel_pstate
driver
一个(也许)简单易懂的对 intel_pstate
驱动的不同模式的一个解释
Preliminary Knowledge - C-states and P-states
Processor P-states and C-states - Thomas-Krenn-Wiki-en
Checking if CPU Supports HWP
Run the command:
cat /proc/cpuinfo |
If the flags
section includes hwp
, then HWP is supported. Generally, Skylake and later processors support HWP, but this may vary on servers.
Kernel Boot Options Behavior [3]
- Default:
- If the CPU supports HWP: active mode with HWP.
- If the CPU does not support HWP: passive mode.
intel_pstate=no_hwp
:Note that the
intel_pstate=no_hwp
setting causes the driver to start in the passive mode if it is not combined withintel_pstate=active
.- Default: passive mode.
- With
intel_pstate=active
in boot options: active mode with no HWP.
Actual Behavior of Different intel_pstate
Operating Modes
Current operating mode: /sys/devices/system/cpu/intel_pstate/status
The scaling_driver
used by the current operating mode: /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
Active Mode with HWP:
- Only needs to provide performance guidance hints to CPU.
- Loads
intel_pstate
intoscaling_driver
asintel_pstate
. - Governor: The OS only has the option to select between the provided performance/powersave governors.
- Mechanism: The CPU adjusts P-states itself. A callback is registered to the CPUFreq core to take over P-states adjustments but doesn’t do anything as the CPU manages it itself.
Active Mode with No HWP:
intel_pstate
works likeacpi-cpufreq
, doing only performance level setting work.- Loads
intel_pstate
intoscaling_driver
asintel_pstate
. - Governor: Only supports performance/powersave governors.
- Mechanism: Uses hardware coordination feedback to adjust P-states. Registers a callback to the CPUFreq core, which contains algorithms for
intel_pstate
to adjust P-states.- Algorithm: Uses APERF and MPERF MSRs to calculate CPU utilization and select performance levels. [6]
Passive Mode:
- Loads
intel_pstate
intoscaling_driver
asintel_cpufreq
. - Supports all governors.
- Mechanism: Exposes all P-states (including Turbo Boost P-states) to the CPUFreq core without registering a callback, allowing Linux to use both Turbo Boost and its own P-states adjustment algorithms.
- Algorithm: Uses IA32_PERF_CTL and IA32_PERF_STATUS. [6]
- Loads
Disabled:
intel_pstate
is not loaded, andacpi-cpufreq
is used asscaling_driver
.- Mechanism: Similar to passive mode, but Turbo Boost cannot be used since the
acpi
generic driver cannot access all P-states.On the other hand, in the passive mode the driver behaves similarly to the generic acpi-cpufreq driver - it collaborates with the regular scaling governors. Although, it can use the full range of frequency steps. [1]
Best Practices
- Currently, for CPUs that do not support HWP, using passive mode with the performance governor is recommended if power consumption is not a concern. This is also what Intel advocates for as well [7].
前置知识 - C-states and P-states
Processor P-states and C-states - Thomas-Krenn-Wiki-en
检查 CPU 是否支持 hwp
cat /proc/cpuinfo
flags 里面有 hwp 相关就是有 hwp,一般来说 skylake 之后就会有,但是好像在服务器上不一样
内核 boot options 行为 [3]
- 默认
- 如果 CPU 支持 hwp - active mode with HWP
- 如果 CPU 不支持 hwp - passive
- intel_pstate=no_hwp
[Note that the
intel_pstate=no_hwp
setting causes the driver to start in the passive mode if it is not combined withintel_pstate=active
.]- 默认 - passive
- boot options 里面还有 intel_pstate=active - active mode with no HWP
intel_pstate 不同运行模式的实际行为
当前使用的运行模式 - /sys/devices/system/cpu/intel_pstate/status
当前运行模式使用的 scaling_driver
- /sys/devices/system/cpu/cpu0/cpufreq/scaling_drvier
- active mode with HWP
- only needs to provide performance guidance hints to CPU
- 会加载 intel_pstate 到
intel_pstate
的scaling_driver
- governor - OS 除了选择提供的 performance / powersave governor 以外没有别的操作空间
- 原理 - CPU 自己会进行 P-states 的调整,会有一个回调注册到 CPUFreq core 来接管 P-states 调整但是实际上不干事情因为 CPU 自己会做
- active mode with no HWP
- intel_pstate work as acpi-cpufreq doing only performance level setting work
- 会加载 intel_pstate 到
intel_pstate
的scaling_driver
- governor - 只支持 performance / powersave
- 原理 - 使用 Hardware coordination feedback 来进行 P-states 算法的调整,会注册一个回调到 CPUFreq core,这个回调里面会有一些算法来让 intel_pstate 调整 P-states
- 算法 - use APERF and MPERF MSRs to calculate CPU utilization and do performance level selection by itself [6]
- passive
- 会加载 intel_pstate 到
intel_cpufreq
的scaling_driver
- 支持所有 governer
- 原理 - 通过把所有的 P-states (包含 Turbo Boost P-states)暴露给 CPUFreq core,同时不会注册回调到 CPUFreq core,这样 Linux 就可以同时使用 Turbo Boost 和自己的 P-states 调整算法
- 算法 - 使用 IA32_PERF_CTL 和 IA32_PERF_STATUS [6]
- 会加载 intel_pstate 到
- disable
- 会不加载 intel_pstate,使用
acpi-cpufreq
作为scaling_driver
- 原理 - 别的都与 passive 相同,但是 Turbo boost 会无法使用,因为 acpi 通用驱动无法获得所有的 P-states
On the other hand, in the passive mode the driver behaves similarly to the generic acpi-cpufreq driver - it collaborates with the regular scaling governors. Although, it can use the full range of frequency steps. [1]
- 会不加载 intel_pstate,使用
最佳实践
- 目前看起来在不支持 HWP 的 CPU 上,不考虑功耗的情况下,用 passive + performance 就可以,这也是 Intel 在推的 [7].
References
[1] https://wiki.gentoo.org/wiki/Power_management/Processor
[2] https://wiki.archlinux.org/title/CPU_frequency_scaling
[3] https://www.kernel.org/doc/html/v6.9/admin-guide/pm/intel_pstate.html
[4] Using Processor Performance P-States with Linux on Intel-based ThinkSystem Servers
[5] https://web.archive.org/web/20220926200749/https://makiras.org/archives/344?amp=1
[6] Intel Power Management & MSR_SAFE
04-2020-01-29-prace-ee-DC-RS.pdf
[7] https://www.phoronix.com/news/P-State-Passive-Def-For-No-HWP