File tree Expand file tree Collapse file tree 3 files changed +28
-0
lines changed Expand file tree Collapse file tree 3 files changed +28
-0
lines changed Original file line number Diff line number Diff line change 2
2
3
3
## master
4
4
* [ ENHANCEMENT] Add bigger tenants and configure default compactor tenant shards
5
+ * [ ENHANCEMENT] Add alert ` CortexCompactorWriteVisitMarkerIsFailing ` to monitor compactors
5
6
6
7
## 1.17.1 / 2024-10-23
7
8
* [ CHANGE] Use cortex v1.17.1
Original file line number Diff line number Diff line change 102
102
||| % $._config,
103
103
},
104
104
},
105
+ {
106
+ // Alert if compactor are not able to update the visit-marker.
107
+ alert: 'CortexCompactorBlockVisitMarkerIsFailing' ,
108
+ 'for' : '2h' ,
109
+ expr: |||
110
+ sum(increase(cortex_compactor_block_visit_marker_write_failed{job=~".+/%(compactor)s"}[2h]))>0
111
+ ||| % $._config.job_names,
112
+ labels: {
113
+ severity: 'critical'
114
+ },
115
+ annotations: {
116
+ message: |||
117
+ Cortex compactors are not able to update the visit marker, double check logs to see what is happening
118
+ |||
119
+ }
120
+ }
105
121
],
106
122
},
107
123
],
Original file line number Diff line number Diff line change @@ -379,6 +379,17 @@ How to **investigate**:
379
379
- Ensure ingesters are successfully shipping blocks to the storage
380
380
- Look for any error in the compactor logs
381
381
382
+ ### CortexCompactorWriteVisitMarkerIsFailing
383
+
384
+ Only applies to compactors when using shuffle sharding.
385
+ This alert fires if the compactor is not able to update the visit marker across all tenants.
386
+ The marker file is a very small json file that should never have any problems getting updated.
387
+
388
+ How to **investigate**:
389
+ - Verify the logs for the compactors, they should show the exact reason
390
+ - If you see the `context canceled` or any other timeouts in the logs,
391
+ consider increasing `-compactor.compaction-visit-marker-timeout` and `-compactor.compaction-visit-marker-file-update-interval`.
392
+
382
393
### CortexCompactorHasNotSuccessfullyRunCompaction
383
394
384
395
This alert fires if the compactor is not able to successfully compact all discovered compactable blocks (across all tenants).
You can’t perform that action at this time.
0 commit comments