Skip to content

[native] OS Metrics(AVG Type Counters) should not report delta values #26516

@lingbin

Description

@lingbin

In the Presto native execution codebase, all 6 OS-related metrics were defined as AVG type counters but were incorrectly reporting delta values instead of cumulative values:

  • presto_cpp.os_user_cpu_time_micros
  • presto_cpp.os_system_cpu_time_micros
  • presto_cpp.os_num_soft_page_faults
  • presto_cpp.os_num_hard_page_faults
  • presto_cpp.os_num_voluntary_context_switches
  • presto_cpp.os_num_forced_context_switches

These OS metrics represent cumulative system resource usage since process start, but were being reported as deltas between consecutive measurements, which is inconsistent with other AVG metrics too and causes incorrect behavior(See below).

Analysis of the codebase shows that all other AVG-type metrics report cumulative values:

  • Task counters like kCounterNumTasks, kCounterNumTasksRunning report current state
  • Only a few specific metrics (SUM type) report deltas

The OS metrics were inconsistent with this pattern, being the only AVG type metrics reporting deltas.

Prometheus Data Loss Issue

When AVG type metrics (GAUGE type in Prometheus) are reported as delta values, it can lead to significant data loss during Prometheus scraping:

Example:

Consider page fault tracking with 2-second metric updates:

  • Time 0s: 100 total page faults
  • Time 2s: 150 total page faults (+50 new)
  • Time 4s: 220 total page faults (+70 new)

With delta reporting:

  • Prometheus scrapes every 5 seconds
  • T=0s: Reports 0 (delta from previous)
  • T=2s: Reports 50 (not scraped)
  • T=4s: Reports 70 (not scraped)
  • T=5s: Reports 0 (delta from T=4s) - Lost 120 page faults!!!

With cumulative reporting:

  • T=0s: Reports 100 (scraped)
  • T=2s: Reports 150 (not scraped)
  • T=4s: Reports 220 (not scraped)
  • T=5s: Reports 220 (scraped)
  • Prometheus calculates rate as (220-100)/5 = 24 faults/second

Possible Solution

Change all 6 OS metrics to report cumulative values instead of delta values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    🆕 Unprioritized

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions