Skip to content

Commit 1a505a5

Browse files
committed
[CI] Always send a heartbeat metric
This script was setup to only upload metrics to Grafana when a new workflow was available. If either the Grafana or github token becomes stale, no metrics would get recorded either. We have alerting in place to detect a lack of update, but because we only uploaded metrics on new workflows, we could have normal cases were no data would get uploaded for a few hours (example, late night weekend). For those reasons, the delay before alerting for no-data had to be set quite high. By adding a fixed heartbeat in the uploaded metrics, we know we MUST receive at least 1 metric every 5 minutes, and can have a more reactive monitoring. Signed-off-by: Nathan Gauër <[email protected]>
1 parent 7060d2a commit 1a505a5

File tree

1 file changed

+15
-3
lines changed

1 file changed

+15
-3
lines changed

.ci/metrics/metrics.py

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,15 @@ def upload_metrics(workflow_metrics, metrics_userid, api_key):
147147
f"Failed to submit data to Grafana: {response.status_code}", file=sys.stderr
148148
)
149149

150+
def make_heartbeat_metric():
151+
return JobMetrics(
152+
"metrics_container_heartbeat",
153+
1, # queue time seconds
154+
2, # run time seconds
155+
3, # job result
156+
time.time_ns(), # created at ns
157+
0, # workflow run ID
158+
)
150159

151160
def main():
152161
# Authenticate with Github
@@ -166,11 +175,14 @@ def main():
166175
while True:
167176
current_metrics = get_metrics(github_repo, workflows_to_track)
168177
if len(current_metrics) == 0:
169-
print("No metrics found to upload.", file=sys.stderr)
170-
continue
178+
print("No metrics found to upload.", file=sys.stdout)
179+
180+
# Always send a hearbeat metric so we can monitor is this container
181+
# is still able to log to Grafana.
182+
current_metrics.append(make_heartbeat_metric())
171183

172184
upload_metrics(current_metrics, grafana_metrics_userid, grafana_api_key)
173-
print(f"Uploaded {len(current_metrics)} metrics", file=sys.stderr)
185+
print(f"Uploaded {len(current_metrics)} metrics", file=sys.stdout)
174186

175187
for workflow_metric in reversed(current_metrics):
176188
workflows_to_track[workflow_metric.job_name] = workflow_metric.workflow_id

0 commit comments

Comments
 (0)