-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
In the Presto native execution codebase, all 6 OS-related metrics were defined as AVG type counters but were incorrectly reporting delta values instead of cumulative values:
- presto_cpp.os_user_cpu_time_micros
- presto_cpp.os_system_cpu_time_micros
- presto_cpp.os_num_soft_page_faults
- presto_cpp.os_num_hard_page_faults
- presto_cpp.os_num_voluntary_context_switches
- presto_cpp.os_num_forced_context_switches
These OS metrics represent cumulative system resource usage since process start, but were being reported as deltas between consecutive measurements, which is inconsistent with other AVG metrics too and causes incorrect behavior(See below).
Analysis of the codebase shows that all other AVG-type metrics report cumulative values:
- Task counters like
kCounterNumTasks,kCounterNumTasksRunningreport current state - Only a few specific metrics (SUM type) report deltas
The OS metrics were inconsistent with this pattern, being the only AVG type metrics reporting deltas.
Prometheus Data Loss Issue
When AVG type metrics (GAUGE type in Prometheus) are reported as delta values, it can lead to significant data loss during Prometheus scraping:
Example:
Consider page fault tracking with 2-second metric updates:
- Time 0s: 100 total page faults
- Time 2s: 150 total page faults (+50 new)
- Time 4s: 220 total page faults (+70 new)
With delta reporting:
- Prometheus scrapes every 5 seconds
- T=0s: Reports 0 (delta from previous)
- T=2s: Reports 50 (not scraped)
- T=4s: Reports 70 (not scraped)
- T=5s: Reports 0 (delta from T=4s) - Lost 120 page faults!!!
With cumulative reporting:
- T=0s: Reports 100 (scraped)
- T=2s: Reports 150 (not scraped)
- T=4s: Reports 220 (not scraped)
- T=5s: Reports 220 (scraped)
- Prometheus calculates rate as (220-100)/5 = 24 faults/second
Possible Solution
Change all 6 OS metrics to report cumulative values instead of delta values.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status