Skip to content

Conversation

@lingbin
Copy link
Contributor

@lingbin lingbin commented Nov 3, 2025

Fixes #26516

All 6 OS-related metrics were defined as AVG type but reported as
delta values, causing incorrect averaging and potential data loss
in Prometheus monitoring.

Changed metrics to report cumulative values since process start:

  • presto_cpp.os_user_cpu_time_micros
  • presto_cpp.os_system_cpu_time_micros
  • presto_cpp.os_num_soft_page_faults
  • presto_cpp.os_num_hard_page_faults
  • presto_cpp.os_num_voluntary_context_switches
  • presto_cpp.os_num_forced_context_switches

This ensures:

  1. Alignment with other AVG metrics in the system (task counts,
    cache sizes, etc.)
  2. Proper rate calculations in monitoring systems and no data loss
    regardless of scraping intervals
== NO RELEASE NOTE ==

@lingbin lingbin requested review from a team as code owners November 3, 2025 16:24
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 3, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

This PR converts six OS-related metrics from reporting delta values to reporting cumulative values by eliminating subtraction of previous readings and removing obsolete state variables used for delta calculations.

Class diagram for updated PeriodicTaskManager OS metrics logic

classDiagram
class PeriodicTaskManager {
  -lastHttpClientNumConnectionsCreated_: int64_t
  +updateOperatingSystemStats()
  +addOperatingSystemStatsUpdateTask()
}

%% Removed attributes for OS metric deltas
%% lastUserCpuTimeUs_, lastSystemCpuTimeUs_, lastSoftPageFaults_, lastHardPageFaults_, lastVoluntaryContextSwitches_, lastForcedContextSwitches_ are no longer present
Loading

Flow diagram for OS metrics reporting change (delta to cumulative)

flowchart TD
    A["Collect OS metric (e.g., user CPU time)"] --> B["Report cumulative value since process start"]
    B --> C["RECORD_METRIC_VALUE(metric, cumulative_value)"]
    %% Previously: A --> D["Subtract previous value (delta)"] --> C
    %% Now: direct cumulative reporting
Loading

File-Level Changes

Change Details Files
Switch OS metrics reporting from delta to cumulative values
  • Removed subtraction of last recorded values when calling RECORD_METRIC_VALUE
  • Updated RECORD_METRIC_VALUE calls to directly use current usage values for all six metrics
presto_cpp/main/PeriodicTaskManager.cpp
Remove unused state variables for tracking previous metric values
  • Deleted lastUserCpuTimeUs_, lastSystemCpuTimeUs_, lastSoftPageFaults_, lastHardPageFaults_, lastVoluntaryContextSwitches_, and lastForcedContextSwitches_ members
presto_cpp/main/PeriodicTaskManager.h

Assessment against linked issues

Issue Objective Addressed Explanation
#26516 Change all 6 OS-related AVG type metrics to report cumulative values instead of delta values.
#26516 Ensure consistency of OS AVG type metrics with other AVG metrics in the system (i.e., all report cumulative values).
#26516 Prevent data loss in Prometheus monitoring by reporting cumulative values for OS AVG type metrics.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@lingbin
Copy link
Contributor Author

lingbin commented Nov 4, 2025

@majetideepak Could you help review this PR? Thanks.

@lingbin
Copy link
Contributor Author

lingbin commented Nov 5, 2025

@majetideepak @karteekmurthys @aditi-pandit Kindly ping. Could you please review this PR? This issue affects the accuracy of Prometheus monitoring metrics.

@jaystarshot
Copy link
Member

I think we should also change the metric type to SUM? In my understanding AVG type should just report the current value.

@lingbin
Copy link
Contributor Author

lingbin commented Nov 5, 2025

I think we should also change the metric type to SUM? In my understanding AVG type should just report the current value.

@jaystarshot Thanks for your reply. We might first need to clarify the meaning of "current value" here: should it be a "current delta value" or "current cumulative value"?

When we say "delta value" for metrics, it implicitly implies a "time span," such as a 5-second span or a 30-second span. It seems only the "cumulative value" corresponds to a "point in time," that is, the "current value" at a specific point in time.

For OS-counters, the getrusage() function returns the "current cumulative value" (not the "delta") accumulated since process startup for each metric.

Regarding whether to change it to a "SUM type"(SUM type reports "delta value"), I believe it can also resolve the data loss issue mentioned in the issue (#26516, because I previously submitted a PR #23622 to fix SUM type metric reporting, the reported value will be accumulated in PrometheusReporter). However, because the getrusage() function returns a "current cumulative value", if we change it to a SUM type (which requires reporting the delta), we would need to save the old value and then periodically calculate the difference. This seems a bit redundant compared to directly recording the result of getrusage(). What do you think? Looking forward to your further feedback.

@jaystarshot
Copy link
Member

@lingbin The root cause of all this is that the velox metric type AVERAGE is unclear and there is no documentation on what that should represent.
I am also like you confused on whether it should be average from the last reported or just average since start. In prometheus presto reporting the average type is represeted as the last value received (check here

In our production we use metric type SUM and use (persecond) and other differential to get accurate view in grapahana directly.

Regarding this change if getrusage() function returns the "current cumulative value" (not the "delta") accumulated then for our prometheus reporting you can keep this change but be aware that it only reports the last value received.

i.e if the puller pulls every 5 sec, it will only pick the last value received which i think is acceptable.

@aditi-pandit
Copy link
Contributor

@xiaoxmeng @amitkdutta Please can you comment. Its possible these metrics are monitored at Meta since Meng added them originally.

@lingbin
Copy link
Contributor Author

lingbin commented Nov 6, 2025

@jaystarshot Thank you for your further explanation.

The root cause of all this is that the velox metric type AVERAGE is unclear and there is no documentation on what that should represent. I am also like you confused on whether it should be average from the last reported or just average since start.

I'm also looking forward to a specific and consistent explanation of how to use each metric type.

In our production we use metric type SUM and use (persecond) and other differential to get accurate view in grapahana directly.

Do you mean that in your production environment code, these AVG metrics (the six OS metrics mentioned here) have already been modified to SUM type?

@lingbin
Copy link
Contributor Author

lingbin commented Nov 6, 2025

After careful consideration, I've realized that for metrics whose values ​​semantically increase monotonically (like the six OS-related metrics here), in Prometheus's "Pull Model", because the interval for "pulling metrics" differs from the "push interval" of PeriodicStatsReporter, the metric ultimately stored in Prometheus MUST NOT be a delta value; otherwise, metric data will be lost or wrong. (Example see #26516 , #23622 (comment))

  • Firstly, from an implementation perspective, when reporting each metric, we only have two implementation methods:

    1. Either directly report the "cumulative value" (also known as the "current cumulative value" or the "most recently received value"): corresponding to the current Velox AVG type;
    2. Or report the delta value and then accumulate it within the prometheus-reporter(PrometheusStatsReporter): corresponding to the current Velox SUM type;

    For the OS-metric here, I think both of the above methods can solve the current problem.

    FYI: Velox's AVG and SUM types both correspond to Prometheus's Gauge types (https://prometheus.io/docs/concepts/metric_types/#gauge). For AVG type, each report uses a new value to overwrite the old value; For SUM type, each report will be accumulated into the old value

  • Secondly, after the Prometheus Server obtains the "cumulative value", it can be displayed(maybe Grafana) in two ways depending on the semantics of the metric:

    1. Displaying the difference or rate: This can be done using Prometheus's rate() (delta per second, ) or increase() function (delta between two pull intervals). This is suitable for the six OS-related metrics mentioned here(their values semantically increase monotonically).
    2. Displaying the "real-time value": This is suitable for metrics such as "driver-count," for example, "presto_cpp.num_on_thread_drivers".

Perhaps we should document both methods so that developers can choose between them based on their needs? If it's easier to obtain the "cumulative value," then use the AVG type. If it's easier to obtain the "difference," then use the SUM type. The only point to note is that, generally speaking, "calculating the difference" can be a bit tedious because it requires saving the old values.

Looking forward to everyone's guidance and suggestions for better practices, especially for usage already in production environments, thanks.

@jaystarshot
Copy link
Member

No not this one, but we have changed some of which we do use. For cpu we currently just use our host metric system.
Ack, This change looks good to me, so i will wait for a day of two for any additional comments from reviewers before approving.

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets get @amitkdutta or @xiaoxmeng approval before submission. Have pinged Amit.

From IBM we are okay with this change. But I would prefer Meta confirm as well.

Copy link
Contributor

@amitkdutta amitkdutta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks @lingbin

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@lingbin lingbin force-pushed the native-fix-os-counters branch from 22ec2df to cdc9c30 Compare November 10, 2025 13:01
@lingbin
Copy link
Contributor Author

lingbin commented Nov 10, 2025

Already rebased to re-trigger CI.

All 6 OS-related metrics were defined as **AVG** type but reported as
**delta values**, causing incorrect averaging and potential data loss
in Prometheus monitoring.

Changed metrics to report **cumulative values** since process start:
- presto_cpp.os_user_cpu_time_micros
- presto_cpp.os_system_cpu_time_micros
- presto_cpp.os_num_soft_page_faults
- presto_cpp.os_num_hard_page_faults
- presto_cpp.os_num_voluntary_context_switches
- presto_cpp.os_num_forced_context_switches

This ensures:
1. Alignment with other AVG metrics in the system (task counts,
   cache sizes, etc.)
2. Proper rate calculations in monitoring systems and no data loss
   regardless of scraping intervals
@lingbin lingbin force-pushed the native-fix-os-counters branch from cdc9c30 to d05707b Compare November 11, 2025 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[native] OS Metrics(AVG Type Counters) should not report delta values

4 participants