Skip to content

Commit f11cc2c

Browse files
committed
Fixed playbook
Signed-off-by: Marco Pracucci <[email protected]>
1 parent a56d1c1 commit f11cc2c

File tree

1 file changed

+9
-10
lines changed

1 file changed

+9
-10
lines changed

cortex-mixin/docs/playbooks.md

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -231,16 +231,6 @@ How to **investigate**:
231231
232232
_If the alert `CortexIngesterTSDBHeadCompactionFailed` fired as well, then give priority to it because that could be the cause._
233233
234-
### CortexRolloutStuck
235-
236-
This alert fires when a Cortex service rollout is stuck, which means the number of updated replicas doesn't match the expected one and looks there's no progress in the rollout. The alert monitors services deployed as Kubernetes `StatefulSet` and `Deployment`.
237-
238-
How to **investigate**:
239-
- Run `kubectl -n <namespace> get pods -l name=<statefulset|deployment>` to get a list of running pods
240-
- Ensure there's no pod in a failing state (eg. `Error`, `OOMKilled`, `CrashLoopBackOff`)
241-
- Ensure there's no pod `NotReady` (the number of ready containers should match the total number of containers, eg. `1/1` or `2/2`)
242-
- Run `kubectl -n <namespace> describe statefulset <name>` or `kubectl -n <namespace> describe deployment <name>` and look at "Pod Status" and "Events" to get more information
243-
244234
#### Ingester hit the disk capacity
245235
246236
If the ingester hit the disk capacity, any attempt to append samples will fail. You should:
@@ -734,6 +724,15 @@ When an alertmanager cannot read the state for a tenant from storage it gets log
734724
- The state could not be merged because it might be invalid and could not be decoded. This could indicate data corruption and therefore a bug in the reading or writing of the state, and would need further investigation.
735725
- The state could not be read from storage. This could be due to a networking issue such as a timeout or an authentication and authorization issue with the remote object store.
736726
727+
### CortexRolloutStuck
728+
729+
This alert fires when a Cortex service rollout is stuck, which means the number of updated replicas doesn't match the expected one and looks there's no progress in the rollout. The alert monitors services deployed as Kubernetes `StatefulSet` and `Deployment`.
730+
731+
How to **investigate**:
732+
- Run `kubectl -n <namespace> get pods -l name=<statefulset|deployment>` to get a list of running pods
733+
- Ensure there's no pod in a failing state (eg. `Error`, `OOMKilled`, `CrashLoopBackOff`)
734+
- Ensure there's no pod `NotReady` (the number of ready containers should match the total number of containers, eg. `1/1` or `2/2`)
735+
- Run `kubectl -n <namespace> describe statefulset <name>` or `kubectl -n <namespace> describe deployment <name>` and look at "Pod Status" and "Events" to get more information
737736
738737
## Cortex routes by path
739738

0 commit comments

Comments
 (0)