-
Notifications
You must be signed in to change notification settings - Fork 130
Description
Summary: There is a possible race condition defect related with reporting the previous state during the current scrape cycle
Found in: sql_exporter, version 0.4.5
Actual state:
We have a scrape job which is performing database query one each scrape cycle. The SQL query, executed at each scrape cycle, is adding current_timestamp field to each SQL statement result. As result, each scrape data point shall be uniques, since the scrape cycles are performed at each 4 minute interval, the current_timestamp fields on consequential samples will be roughly 4 minutes offsetted.
The problematic observation is that when we inspect the data points, persisted in the Prometheus, we found that there are data point with 8 minutes interval and with the current setup, where the current_timestamp is rendering each data point to be unique, this behavior is incorrect.
We shall highlighted that for Prometheus, merging two sequential data point in a combined data point with totalinterval equals to the intervals of the merged data points is only possible if the sequential data points are completely equal (all their fields are equals).
In our case this merged data point behavior shall not be possible since each data point has unique current_timestamp field.
This improper behavior is presented at the attached screenshots, it might be observed that the referred data point has length of 8 minutes, instead of displaying two data point with 4 minutes length. One might observe that the screenshot is rendering just one broken data point, and all other data points have 4 minutes length.
a sample for broken data point:
a broken data point starting time:

a broken data point ending time:

Our root-cause hypothesis is that sometimes the 4 minute scrape cycle interval is not enough to perform the SQL query for the current scrap cycle and as result sql-exporter is reporting the previous state (the previous already persisted data point state).
The exposed root-cause hypothesis is supporting the observation that single data point are reported two consequential times and merged into one data point with double length (8 minutes, instead of expected 4) since all filed for the properties are equals, including the current sample timestamp.
Desired state:
A data state shall not be reported more than once, possibly when a data point is reported successfully the its state shall be removed from the registry of the sql-exported service. Possible an warning message shall be raised, if the current SQL queries is not completed in the scope of current scraping cycle interval.