Skip to content

Conversation

@alpineQ
Copy link

@alpineQ alpineQ commented Nov 11, 2025

Closes #2995

Changes

This PR adds support for Linux Pressure Stall Information (PSI) metrics to the system semantic conventions.

PSI is a Linux kernel feature (available since kernel 4.20) that identifies and quantifies resource contention by measuring the time impact that CPU, memory, and I/O resource crunches have on workloads.

New Metrics

  • system.linux.psi.pressure (Gauge): Measures resource pressure as a percentage of time that tasks were stalled over a time window (10s, 60s, or 300s)
  • system.linux.psi.total_time (Counter): Tracks the total cumulative stall time in microseconds since system boot

New Attributes

  • system.psi.resource: The resource type (cpu, memory, io)
  • system.psi.stall_type: The stall severity (some for partial stalls, full for complete stalls where all non-idle tasks are blocked)
  • system.psi.window: The time window for pressure calculation (10s, 60s, 300s)

Use Cases

PSI metrics enable:

  • Sizing workloads to hardware or provisioning hardware according to workload demand
  • Detecting productivity losses caused by resource scarcity
  • Dynamic system management (load shedding, job migration, strategic pausing)
  • Maximizing hardware utilization without sacrificing workload health

References

Relevant issues and PRs

There are issues on this matter in:

And 2 PRs that I am proposing to address these issues:

Important

Pull requests acceptance are subject to the triage process as described in Issue and PR Triage Management.
PRs that do not follow the guidance above, may be automatically rejected and closed.

Merge requirement checklist

  • CONTRIBUTING.md guidelines followed.
  • Change log entry added, according to the guidelines in When to add a changelog entry.
    • If your PR does not need a change log, start the PR title with [chore]
  • Links to the prototypes or existing instrumentations (when adding or changing conventions)

Reopened #2996

@lmolkova lmolkova moved this from Untriaged to Awaiting codeowners approval in Semantic Conventions Triage Nov 20, 2025
@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 26, 2025
@alpineQ
Copy link
Author

alpineQ commented Nov 27, 2025

@thompson-tomo @braydonk @trask
Issue #2996 was reopened here. If any additional changes are needed, I'm open to suggestions.

@thompson-tomo
Copy link
Contributor

@alpineQ can you rebase/merge in master as the doc templates have been updated.

@github-actions github-actions bot removed the Stale label Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:system enhancement New feature or request

Projects

Status: Awaiting codeowners approval

Development

Successfully merging this pull request may close these issues.

Add Pressure Stall Information (PSI) metrics

2 participants