Skip to content

Commit daef7ff

Browse files
authored
[Monitoring] Add logging to understand stuck testcases (#4489)
### Motivation #4364and #4381 have introduced metrics for how long it took between a testcase creation and some event of interest (cleanup considering to close a bug, analyze finishing, et al). These percentile metrics allowed us to notice that there are several testcases that did not make progress for more than a year. This PR introduces logging so we can pinpoint them, and discuss if a purge of some sort is required
1 parent 514cec0 commit daef7ff

File tree

2 files changed

+6
-0
lines changed

2 files changed

+6
-0
lines changed

src/clusterfuzz/_internal/common/testcase_utils.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,10 @@ def emit_testcase_triage_duration_metric(testcase_id: int, step: str):
6161
' failed to emit TESTCASE_UPLOAD_TRIAGE_DURATION metric.')
6262
return
6363

64+
logs.info('Emiting TESTCASE_UPLOAD_TRIAGE_DURATION metric for testcase '
65+
f'{testcase_id} (age = {elapsed_time_since_upload}) '
66+
'in step {step}.')
67+
6468
monitoring_metrics.TESTCASE_UPLOAD_TRIAGE_DURATION.add(
6569
elapsed_time_since_upload,
6670
labels={

src/clusterfuzz/_internal/cron/triage.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -317,6 +317,8 @@ def _emit_untriaged_testcase_age_metric(critical_tasks_completed: bool,
317317
if not testcase.timestamp:
318318
return
319319

320+
logs.info(f'Emiting UNTRIAGED_TESTCASE_AGE for testcase {testcase.key.id()} '
321+
f'(age = {testcase.get_age_in_seconds()})')
320322
monitoring_metrics.UNTRIAGED_TESTCASE_AGE.add(
321323
testcase.get_age_in_seconds(),
322324
labels={

0 commit comments

Comments
 (0)