Skip to content

Conversation

@alpineQ
Copy link

@alpineQ alpineQ commented Oct 28, 2025

Closes #2995

Changes

This PR adds support for Linux Pressure Stall Information (PSI) metrics to the system semantic conventions.

PSI is a Linux kernel feature (available since kernel 4.20) that identifies and quantifies resource contention by measuring the time impact that CPU, memory, and I/O resource crunches have on workloads.

New Metrics

  • system.linux.psi.pressure (Gauge): Measures resource pressure as a percentage of time that tasks were stalled over a time window (10s, 60s, or 300s)
  • system.linux.psi.total_time (Counter): Tracks the total cumulative stall time in microseconds since system boot

New Attributes

  • system.psi.resource: The resource type (cpu, memory, io)
  • system.psi.stall_type: The stall severity (some for partial stalls, full for complete stalls where all non-idle tasks are blocked)
  • system.psi.window: The time window for pressure calculation (10s, 60s, 300s)

Use Cases

PSI metrics enable:

  • Sizing workloads to hardware or provisioning hardware according to workload demand
  • Detecting productivity losses caused by resource scarcity
  • Dynamic system management (load shedding, job migration, strategic pausing)
  • Maximizing hardware utilization without sacrificing workload health

References

Relevant issues and PRs

There are issues on this matter in:

And 2 PRs that I am proposing to address these issues:

Important

Pull requests acceptance are subject to the triage process as described in Issue and PR Triage Management.
PRs that do not follow the guidance above, may be automatically rejected and closed.

Merge requirement checklist

  • CONTRIBUTING.md guidelines followed.
  • Change log entry added, according to the guidelines in When to add a changelog entry.
    • If your PR does not need a change log, start the PR title with [chore]
  • Links to the prototypes or existing instrumentations (when adding or changing conventions)

@alpineQ alpineQ requested review from a team as code owners October 28, 2025 20:51
@github-actions github-actions bot added enhancement New feature or request area:system labels Oct 28, 2025
@lmolkova lmolkova moved this from Untriaged to Awaiting codeowners approval in Semantic Conventions Triage Oct 28, 2025
Copy link
Contributor

@braydonk braydonk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry didn't submit the review last night.

@alpineQ
Copy link
Author

alpineQ commented Nov 6, 2025

Is there anything else I can or should do to get this PR merged?

Copy link
Contributor

@thompson-tomo thompson-tomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should also be confirmed if the metrics should be renamed to system.psi.linux.* to follow the changes made in #2984

@alpineQ
Copy link
Author

alpineQ commented Nov 10, 2025

It should also be confirmed if the metrics should be renamed to system.psi.linux.* to follow the changes made in #2984

Any updates on whether this should be changed?

@braydonk
Copy link
Contributor

It should also be confirmed if the metrics should be renamed to system.psi.linux.* to follow the changes made in #2984

This won't be necessary for system.psi.*. OS name in metric names is for subsystems where some metrics are shared across platforms (system.memory.* has some metrics that are shared and some that are Linux-exclusive). PSI is a Linux Kernel subsystem, so there will be no ambiguity.

@thompson-tomo
Copy link
Contributor

This won't be necessary for system.psi.*. OS name in metric names is for subsystems where some metrics are shared across platforms

@braydonk currently this is not being followed as the metrics are currently named system.linux.psi.* if os name is not needed then shouldn't these metrics be system.psi.* just like the attributes.

renovate bot and others added 2 commits November 11, 2025 09:56
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
@alpineQ alpineQ requested review from a team as code owners November 11, 2025 06:56
@linux-foundation-easycla
Copy link

CLA Missing ID CLA Not Signed

One or more co-authors of this pull request were not found. You must specify co-authors in commit message trailer via:

Co-authored-by: name <email>

Supported Co-authored-by: formats include:

  1. Anything <[email protected]> - it will locate your GitHub user by id part.
  2. Anything <[email protected]> - it will locate your GitHub user by login part.
  3. Anything <public-email> - it will locate your GitHub user by public-email part. Note that this email must be made public on Github.
  4. Anything <other-email> - it will locate your GitHub user by other-email part but only if that email was used before for any other CLA as a main commit author.
  5. login <any-valid-email> - it will locate your GitHub user by login part, note that login part must be at least 3 characters long.

Please update your commit message(s) by doing git commit --amend and then git push [--force] and then request re-running CLA check via commenting on this pull request:

/easycla

@github-actions
Copy link

This PR contains changes to area(s) that do not have an active SIG/project and will be auto-closed:

  • otel

Such changes may be rejected or put on hold until a new SIG/project is established.

Please refer to the Semantic Convention Areas
document to see the current active SIGs and also to learn how to kick start a new one.

@github-actions github-actions bot closed this Nov 11, 2025
@alpineQ
Copy link
Author

alpineQ commented Nov 11, 2025

Wrong rebase order 🙈
Fixed it in my fork. Please reopen

@joaopgrassi
Copy link
Member

There are still issues with the rebase, as I see changes on the otel namespace.

@alpineQ
Copy link
Author

alpineQ commented Nov 11, 2025

"Files changed" tab in this PR no longer updates changes as PR was closed. Actual changes can be seen here: main...alpineQ:semantic-conventions:main

I can open a new PR if this will help with the issue

@braydonk
Copy link
Contributor

OS name in metric names is for subsystems where some metrics are shared across platforms (system.memory.* has some metrics that are shared and some that are Linux-exclusive). PSI is a Linux Kernel subsystem, so there will be no ambiguity.

I was mixed up when I said this. linux should definitely not be after psi, but I forgot that according to our guidance the OS name should still be there. system.linux is the right order though; the point is that the OS name goes after the "area of concern" where different platforms have divergent offerings. In this case system is the divergence point, whereas all metrics withing psi are automatically all Linux.

If os name is not needed then shouldn't these metrics be system.psi.* just like the attributes.

We should do the inverse of this; the attributes should be changed to system.linux. You can find our written guidance on this here.

@thompson-tomo
Copy link
Contributor

Ah ok, then attributes should be renamed.

I am aware of that guidance which is what prompted my suggestion to rename as I was seeing psi as the area of concern just like memory. Might be worthwhile to update that doc to define what the area of concern and encorporate terminology about diverging.

@braydonk
Copy link
Contributor

I think that's what this last line is intending to say, perhaps could be worded better:

However, to clarify — when we refer to avoiding OS names at the “root namespace” level, we also mean avoiding them at the area level. The OS name should appear after the area of concern (such as `system.memory.linux.*`), not before it. This ensures that users can first navigate by functional area (e.g. memory, CPU, network) and then, if necessary, drill down into OS-specific variants within that area.

@thompson-tomo
Copy link
Contributor

Have raised #3067 to capture the improvement.

And yes the last line is trying to do that but a reader doesn't know what the area should be.

@alpineQ i would recommend opening a new as it is unlikely this PR can be re-opened due to removed/changed commits. Note the attributes should be renamed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Add Pressure Stall Information (PSI) metrics

7 participants