Skip to content

Commit 93be772

Browse files
committed
k8s-exit-codes-cre
Signed-off-by: amanycodes <[email protected]>
1 parent 7a84cca commit 93be772

File tree

9 files changed

+209
-1
lines changed

9 files changed

+209
-1
lines changed
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
rules:
2+
- metadata:
3+
kind: prequel
4+
id: TDtvSvVJU5vUaLZuRZBVk9
5+
generation: 1
6+
cre:
7+
id: CRE-2025-0127
8+
severity: 2
9+
title: Container exited 127 — command not found (bad entrypoint/command)
10+
category: configuration-problem
11+
author: CRE Community
12+
description: |
13+
Exit code 127 indicates the configured command/entrypoint was not found in the image or PATH.
14+
New or misconfigured deployments commonly hit this and immediately crash.
15+
cause: |
16+
- Typo in command/entrypoint or wrong binary path.
17+
- Missing executable in the image or non-executable file permissions.
18+
- Different base image where the tool isn’t installed by default.
19+
impact: |
20+
- Pod fails to start; controllers may enter CrashLoopBackOff.
21+
- Service remains unavailable until the image/command is fixed.
22+
tags:
23+
- Kubernetes
24+
- exit-code
25+
- command
26+
- entrypoint
27+
- startup
28+
mitigation: |
29+
- Verify the binary exists and is executable (`chmod +x`).
30+
- Use absolute paths or fix PATH; test with `docker run` locally.
31+
- Add pre-deploy checks in CI to validate entrypoint presence.
32+
references:
33+
- "https://kubernetes.io/docs/concepts/workloads/pods/"
34+
applications:
35+
- name: kubernetes
36+
version: "*"
37+
impactScore: 3
38+
mitigationScore: 3
39+
reports: 6
40+
rule:
41+
set:
42+
event:
43+
source: cre.log.k8s
44+
match:
45+
- regex: "^[^\\t]+\\t[^\\t/]+/[^\\t]+\\t[^\\t]+\\t[^\\t]*\\t127$"

rules/cre-2025-0127/test.log

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
2025-08-27T13:18:27Z cre-demo/cmd-127 bad-cmd Error 127
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
rules:
2+
- metadata:
3+
kind: prequel
4+
id: T8tPh6u4Bj7nXidQWbRvvj
5+
generation: 1
6+
cre:
7+
id: CRE-2025-0134
8+
severity: 2
9+
title: Container exited 134 — SIGABRT / assertion failure
10+
category: runtime-problem
11+
author: CRE Community
12+
description: |
13+
Exit code 134 indicates the process aborted via SIGABRT, commonly due to failed assertions,
14+
allocator checks (e.g., glibc detecting heap corruption), or explicit abort() calls.
15+
cause: |
16+
- assert(false) / std::abort() in C/C++.
17+
- Memory allocator consistency errors (double free, corruption).
18+
- Defensive abort on unrecoverable invariant violations.
19+
impact: |
20+
- Immediate termination of the container; possible loss of in-flight work.
21+
- Repeated crashes if the triggering condition is deterministic at startup.
22+
tags:
23+
- Kubernetes
24+
- exit-code
25+
- sigabrt
26+
- assertion
27+
- native
28+
mitigation: |
29+
- Enable core dumps and symbols; capture backtraces.
30+
- Run ASAN/UBSAN builds in staging to localize corruption.
31+
- Pin and verify libc/libstdc++ versions; roll back recent native changes.
32+
references:
33+
- "https://www.gnu.org/software/libc/manual/html_node/Aborting-a-Program.html"
34+
applications:
35+
- name: kubernetes
36+
version: "*"
37+
impactScore: 6
38+
mitigationScore: 2
39+
reports: 4
40+
rule:
41+
set:
42+
event:
43+
source: cre.log.k8s
44+
match:
45+
- regex: "^[^\\t]+\\t[^\\t/]+/[^\\t]+\\t[^\\t]+\\t[^\\t]*\\t134$"

rules/cre-2025-0134/test.log

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
2025-08-27T13:26:39Z cre-demo/abort-134 aborter Error 134
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
rules:
2+
- metadata:
3+
kind: prequel
4+
id: VuPxiuWkYodzUqupa7gh9N
5+
generation: 1
6+
cre:
7+
id: CRE-2025-0137
8+
severity: 1
9+
title: Pod terminated with Exit Code 137 due to OOMKilled (memory limit exceeded)
10+
category: memory-problem
11+
author: CRE Community
12+
description: |
13+
The container exceeded its memory limit and was killed by the kernel OOM killer.
14+
Kubernetes reports a terminated state with Reason=OOMKilled and exitCode=137.
15+
This often manifests as CrashLoopBackOff under sustained memory pressure.
16+
cause: |
17+
- Memory limit too low relative to peak usage.
18+
- Sudden traffic spikes causing allocation bursts.
19+
- Memory leaks or fragmentation in long-running processes.
20+
- Under-provisioned nodes or overly strict pod limits.
21+
impact: |
22+
- Request errors and latency spikes during restarts.
23+
- CrashLoopBackOff and reduced availability.
24+
- Potential loss of in-flight work not checkpointed to durable storage.
25+
tags:
26+
- Kubernetes
27+
- exit-code
28+
- out-of-memory
29+
- memory
30+
- crash-loop
31+
- reliability
32+
mitigation: |
33+
- Raise memory requests/limits; add headroom for peak allocations.
34+
- Enable profiling and leak detection; tune GC/heap where applicable.
35+
- Consider Vertical Pod Autoscaler for right-sizing.
36+
- Watch node memory pressure and eviction thresholds.
37+
references:
38+
- "https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-states"
39+
- "https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/"
40+
applications:
41+
- name: kubernetes
42+
version: "*"
43+
impactScore: 6
44+
mitigationScore: 2
45+
reports: 12
46+
rule:
47+
set:
48+
event:
49+
source: cre.log.k8s
50+
match:
51+
- regex: "^[^\\t]+\\t[^\\t/]+/[^\\t]+\\t[^\\t]+\\tOOMKilled\\t137$"

rules/cre-2025-0137/test.log

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
2025-08-27T11:51:17Z cre-demo/oom-137 eater OOMKilled 137
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
rules:
2+
- metadata:
3+
kind: prequel
4+
id: KHtVUpTbZHaevdn4EABmQEe
5+
generation: 1
6+
cre:
7+
id: CRE-2025-0139
8+
severity: 2
9+
title: Container exited 139 — segmentation fault (SIGSEGV)
10+
category: runtime-problem
11+
author: CRE Community
12+
description: |
13+
Exit code 139 indicates SIGSEGV (invalid memory access) in native/runtime code.
14+
Frequently caused by unsafe pointer operations, ABI/library mismatches, or native extensions.
15+
cause: |
16+
- Null dereference or out-of-bounds access in C/C++/Rust unsafe blocks.
17+
- Incompatible glibc/musl or driver/library versions.
18+
- Faulty JNI/ctypes/native extension code paths.
19+
impact: |
20+
- Hard crash; requests being processed may be dropped.
21+
- Repeated crashes if the segfault occurs deterministically at startup.
22+
tags:
23+
- Kubernetes
24+
- exit-code
25+
- segfault
26+
- native
27+
- reliability
28+
mitigation: |
29+
- Enable core dumps and symbol files; capture stack traces.
30+
- Pin compatible base image/libc; verify ABI expectations.
31+
- Use ASAN/UBSAN builds; bisect recent native/library changes.
32+
references:
33+
- "https://man7.org/linux/man-pages/man7/signal.7.html"
34+
applications:
35+
- name: kubernetes
36+
version: "*"
37+
impactScore: 7
38+
mitigationScore: 2
39+
reports: 5
40+
rule:
41+
set:
42+
event:
43+
source: cre.log.k8s
44+
match:
45+
- regex: "^[^\\t]+\\t[^\\t/]+/[^\\t]+\\t[^\\t]+\\t[^\\t]*\\t139$"

rules/cre-2025-0139/test.log

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
2025-08-27T13:32:40Z cre-demo/segv-139 segv Error 139

rules/tags/tags.yaml

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -844,4 +844,22 @@ tags:
844844
description: Issues with Kubernetes pod scheduling due to resource constraints or networking problems
845845
- name: cluster-scaling
846846
displayName: Cluster Scaling
847-
description: Problems related to Kubernetes cluster scaling operations and capacity management
847+
description: Problems related to Kubernetes cluster scaling operations and capacity management
848+
- name: exit-code
849+
displayName: Exit Code
850+
description: Problems identified by specific process/container exit codes (e.g., 137, 127, 134, 139).
851+
- name: entrypoint
852+
displayName: Entrypoint
853+
description: Failures caused by invalid or missing container ENTRYPOINT/CMD definitions.
854+
- name: command
855+
displayName: Command
856+
description: Problems caused by invalid commands or arguments at startup (e.g., not found, bad path, non-executable).
857+
- name: sigabrt
858+
displayName: SIGABRT
859+
description: Crashes where a process aborts with SIGABRT (exit 134), often due to assertion failures or allocator checks.
860+
- name: native
861+
displayName: Native
862+
description: Issues in native code paths (C/C++/Rust, libc/ABI), including crashes and memory faults.
863+
- name: reliability
864+
displayName: Reliability
865+
description: Unstable behavior such as unexpected restarts, crash loops, or intermittent failures affecting service reliability.

0 commit comments

Comments
 (0)