Breakertt Blog

一起实现梦想的故事!

Different modes in intel_pstate / intel_pstate 的各种运行模式

Breakertt's Avatar 2024-08-06 Linux

  1. 1. Preliminary Knowledge - C-states and P-states
  2. 2. Checking if CPU Supports HWP
  3. 3. Kernel Boot Options Behavior [3]
  4. 4. Actual Behavior of Different intel_pstate Operating Modes
  5. 5. Best Practices
  6. 6. 前置知识 - C-states and P-states
  7. 7. 检查 CPU 是否支持 hwp
  8. 8. 内核 boot options 行为 [3]
  9. 9. intel_pstate 不同运行模式的实际行为
  10. 10. 最佳实践
  11. 11. References

An (potentially) easy-to-understand explanation on different modes in intel_pstate driver

一个(也许)简单易懂的对 intel_pstate 驱动的不同模式的一个解释

(中文)

Preliminary Knowledge - C-states and P-states

Processor P-states and C-states - Thomas-Krenn-Wiki-en

Checking if CPU Supports HWP

Run the command:

cat /proc/cpuinfo

If the flags section includes hwp, then HWP is supported. Generally, Skylake and later processors support HWP, but this may vary on servers.

Kernel Boot Options Behavior [3]

  • Default:
    • If the CPU supports HWP: active mode with HWP.
    • If the CPU does not support HWP: passive mode.
  • intel_pstate=no_hwp:

    Note that the intel_pstate=no_hwp setting causes the driver to start in the passive mode if it is not combined with intel_pstate=active.

    • Default: passive mode.
    • With intel_pstate=active in boot options: active mode with no HWP.

Actual Behavior of Different intel_pstate Operating Modes

Current operating mode: /sys/devices/system/cpu/intel_pstate/status

The scaling_driver used by the current operating mode: /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver

  • Active Mode with HWP:

    • Only needs to provide performance guidance hints to CPU.
    • Loads intel_pstate into scaling_driver as intel_pstate.
    • Governor: The OS only has the option to select between the provided performance/powersave governors.
    • Mechanism: The CPU adjusts P-states itself. A callback is registered to the CPUFreq core to take over P-states adjustments but doesn’t do anything as the CPU manages it itself.
  • Active Mode with No HWP:

    • intel_pstate works like acpi-cpufreq, doing only performance level setting work.
    • Loads intel_pstate into scaling_driver as intel_pstate.
    • Governor: Only supports performance/powersave governors.
    • Mechanism: Uses hardware coordination feedback to adjust P-states. Registers a callback to the CPUFreq core, which contains algorithms for intel_pstate to adjust P-states.
      • Algorithm: Uses APERF and MPERF MSRs to calculate CPU utilization and select performance levels. [6]
  • Passive Mode:

    • Loads intel_pstate into scaling_driver as intel_cpufreq.
    • Supports all governors.
    • Mechanism: Exposes all P-states (including Turbo Boost P-states) to the CPUFreq core without registering a callback, allowing Linux to use both Turbo Boost and its own P-states adjustment algorithms.
      • Algorithm: Uses IA32_PERF_CTL and IA32_PERF_STATUS. [6]
  • Disabled:

    • intel_pstate is not loaded, and acpi-cpufreq is used as scaling_driver.
    • Mechanism: Similar to passive mode, but Turbo Boost cannot be used since the acpi generic driver cannot access all P-states.

      On the other hand, in the passive mode the driver behaves similarly to the generic acpi-cpufreq driver - it collaborates with the regular scaling governors. Although, it can use the full range of frequency steps. [1]

Best Practices

  • Currently, for CPUs that do not support HWP, using passive mode with the performance governor is recommended if power consumption is not a concern. This is also what Intel advocates for as well [7].

前置知识 - C-states and P-states

Processor P-states and C-states - Thomas-Krenn-Wiki-en

检查 CPU 是否支持 hwp

cat /proc/cpuinfo

flags 里面有 hwp 相关就是有 hwp,一般来说 skylake 之后就会有,但是好像在服务器上不一样

内核 boot options 行为 [3]

  • 默认
    • 如果 CPU 支持 hwp - active mode with HWP
    • 如果 CPU 不支持 hwp - passive
  • intel_pstate=no_hwp

    [Note that the intel_pstate=no_hwp setting causes the driver to start in the passive mode if it is not combined with intel_pstate=active.]

    • 默认 - passive
    • boot options 里面还有 intel_pstate=active - active mode with no HWP

intel_pstate 不同运行模式的实际行为

当前使用的运行模式 - /sys/devices/system/cpu/intel_pstate/status

当前运行模式使用的 scaling_driver - /sys/devices/system/cpu/cpu0/cpufreq/scaling_drvier

  • active mode with HWP
    • only needs to provide performance guidance hints to CPU
    • 会加载 intel_pstate 到 intel_pstatescaling_driver
    • governor - OS 除了选择提供的 performance / powersave governor 以外没有别的操作空间
    • 原理 - CPU 自己会进行 P-states 的调整,会有一个回调注册到 CPUFreq core 来接管 P-states 调整但是实际上不干事情因为 CPU 自己会做
  • active mode with no HWP
    • intel_pstate work as acpi-cpufreq doing only performance level setting work
    • 会加载 intel_pstate 到 intel_pstatescaling_driver
    • governor - 只支持 performance / powersave
    • 原理 - 使用 Hardware coordination feedback 来进行 P-states 算法的调整,会注册一个回调到 CPUFreq core,这个回调里面会有一些算法来让 intel_pstate 调整 P-states
      • 算法 - use APERF and MPERF MSRs to calculate CPU utilization and do performance level selection by itself [6]
  • passive
    • 会加载 intel_pstate 到 intel_cpufreqscaling_driver
    • 支持所有 governer
    • 原理 - 通过把所有的 P-states (包含 Turbo Boost P-states)暴露给 CPUFreq core,同时不会注册回调到 CPUFreq core,这样 Linux 就可以同时使用 Turbo Boost 和自己的 P-states 调整算法
      • 算法 - 使用 IA32_PERF_CTL 和 IA32_PERF_STATUS [6]
  • disable
    • 会不加载 intel_pstate,使用 acpi-cpufreq 作为 scaling_driver
    • 原理 - 别的都与 passive 相同,但是 Turbo boost 会无法使用,因为 acpi 通用驱动无法获得所有的 P-states

      On the other hand, in the passive mode the driver behaves similarly to the generic acpi-cpufreq driver - it collaborates with the regular scaling governors. Although, it can use the full range of frequency steps. [1]

最佳实践

  • 目前看起来在不支持 HWP 的 CPU 上,不考虑功耗的情况下,用 passive + performance 就可以,这也是 Intel 在推的 [7].

References

[1] https://wiki.gentoo.org/wiki/Power_management/Processor

[2] https://wiki.archlinux.org/title/CPU_frequency_scaling

[3] https://www.kernel.org/doc/html/v6.9/admin-guide/pm/intel_pstate.html

[4] Using Processor Performance P-States with Linux on Intel-based ThinkSystem Servers

lp1946.pdf

[5] https://web.archive.org/web/20220926200749/https://makiras.org/archives/344?amp=1

[6] Intel Power Management & MSR_SAFE

04-2020-01-29-prace-ee-DC-RS.pdf

[7] https://www.phoronix.com/news/P-State-Passive-Def-For-No-HWP

本文最后更新于 天前,文中所描述的信息可能已发生改变