Skip to content

Commit 9572bae

Browse files
Merge pull request #137 from amanycodes/k8s-exit-code
Add Kubernetes Exit-Code CREs: 137 (OOMKilled), 127 (Command Not Found), 134 (SIGABRT), 139 (SIGSEGV)
2 parents c414539 + 1e34924 commit 9572bae

File tree

9 files changed

+204
-0
lines changed

9 files changed

+204
-0
lines changed
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
rules:
2+
- metadata:
3+
kind: prequel
4+
id: TDtvSvVJU5vUaLZuRZBVk9
5+
cre:
6+
id: CRE-2025-0127
7+
severity: 2
8+
title: Container exited 127 due to command not found (bad entrypoint/command)
9+
category: configuration-problem
10+
author: CRE Community
11+
description: |
12+
Exit code 127 indicates the configured command/entrypoint was not found in the image or PATH.
13+
New or misconfigured deployments commonly hit this and immediately crash.
14+
cause: |
15+
- Typo in command/entrypoint or wrong binary path.
16+
- Missing executable in the image or non-executable file permissions.
17+
- Different base image where the tool isn’t installed by default.
18+
impact: |
19+
- Pod fails to start; controllers may enter CrashLoopBackOff.
20+
- Service remains unavailable until the image/command is fixed.
21+
tags:
22+
- k8s
23+
- exit-code
24+
- command
25+
- entrypoint
26+
- startup-failure
27+
mitigation: |
28+
- Verify the binary exists and is executable (`chmod +x`).
29+
- Use absolute paths or fix PATH; test with `docker run` locally.
30+
- Add pre-deploy checks in CI to validate entrypoint presence.
31+
references:
32+
- "https://kubernetes.io/docs/concepts/workloads/pods/"
33+
applications:
34+
- name: kubernetes
35+
version: ">=1.16"
36+
impactScore: 3
37+
mitigationScore: 3
38+
reports: 6
39+
rule:
40+
set:
41+
event:
42+
source: cre.log.k8s
43+
match:
44+
- regex: "^[^\\t]+\\t[^\\t/]+/[^\\t]+\\t[^\\t]+\\t[^\\t]*\\t127$"

rules/cre-2025-0127/test.log

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
2025-08-27T13:18:27Z cre-demo/cmd-127 bad-cmd Error 127
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
rules:
2+
- metadata:
3+
kind: prequel
4+
id: T8tPh6u4Bj7nXidQWbRvvj
5+
cre:
6+
id: CRE-2025-0134
7+
severity: 2
8+
title: Container exited 134 due to SIGABRT / assertion failure
9+
category: runtime-problem
10+
author: CRE Community
11+
description: |
12+
Exit code 134 indicates the process aborted via SIGABRT, commonly due to failed assertions,
13+
allocator checks (e.g., glibc detecting heap corruption), or explicit abort() calls.
14+
cause: |
15+
- assert(false) / std::abort() in C/C++.
16+
- Memory allocator consistency errors (double free, corruption).
17+
- Defensive abort on unrecoverable invariant violations.
18+
impact: |
19+
- Immediate termination of the container; possible loss of in-flight work.
20+
- Repeated crashes if the triggering condition is deterministic at startup.
21+
tags:
22+
- k8s
23+
- exit-code
24+
- sigabrt
25+
- assertion
26+
- native
27+
mitigation: |
28+
- Enable core dumps and symbols; capture backtraces.
29+
- Run ASAN/UBSAN builds in staging to localize corruption.
30+
- Pin and verify libc/libstdc++ versions; roll back recent native changes.
31+
references:
32+
- "https://www.gnu.org/software/libc/manual/html_node/Aborting-a-Program.html"
33+
applications:
34+
- name: kubernetes
35+
version: ">=1.16"
36+
impactScore: 6
37+
mitigationScore: 2
38+
reports: 4
39+
rule:
40+
set:
41+
event:
42+
source: cre.log.k8s
43+
match:
44+
- regex: "^[^\\t]+\\t[^\\t/]+/[^\\t]+\\t[^\\t]+\\t[^\\t]*\\t134$"

rules/cre-2025-0134/test.log

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
2025-08-27T13:26:39Z cre-demo/abort-134 aborter Error 134
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
rules:
2+
- metadata:
3+
kind: prequel
4+
id: VuPxiuWkYodzUqupa7gh9N
5+
cre:
6+
id: CRE-2025-0137
7+
severity: 1
8+
title: Pod terminated with Exit Code 137 due to OOMKilled (memory limit exceeded)
9+
category: memory-problem
10+
author: CRE Community
11+
description: |
12+
The container exceeded its memory limit and was killed by the kernel OOM killer.
13+
Kubernetes reports a terminated state with Reason=OOMKilled and exitCode=137.
14+
This often manifests as CrashLoopBackOff under sustained memory pressure.
15+
cause: |
16+
- Memory limit too low relative to peak usage.
17+
- Sudden traffic spikes causing allocation bursts.
18+
- Memory leaks or fragmentation in long-running processes.
19+
- Under-provisioned nodes or overly strict pod limits.
20+
impact: |
21+
- Request errors and latency spikes during restarts.
22+
- CrashLoopBackOff and reduced availability.
23+
- Potential loss of in-flight work not checkpointed to durable storage.
24+
tags:
25+
- k8s
26+
- exit-code
27+
- out-of-memory
28+
- memory
29+
- crash-loop
30+
- reliability
31+
mitigation: |
32+
- Raise memory requests/limits; add headroom for peak allocations.
33+
- Enable profiling and leak detection; tune GC/heap where applicable.
34+
- Consider Vertical Pod Autoscaler for right-sizing.
35+
- Watch node memory pressure and eviction thresholds.
36+
references:
37+
- "https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-states"
38+
- "https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/"
39+
applications:
40+
- name: kubernetes
41+
version: ">=1.16"
42+
impactScore: 6
43+
mitigationScore: 2
44+
reports: 12
45+
rule:
46+
set:
47+
event:
48+
source: cre.log.k8s
49+
match:
50+
- regex: "^[^\\t]+\\t[^\\t/]+/[^\\t]+\\t[^\\t]+\\tOOMKilled\\t137$"

rules/cre-2025-0137/test.log

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
2025-08-27T11:51:17Z cre-demo/oom-137 eater OOMKilled 137
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
rules:
2+
- metadata:
3+
kind: prequel
4+
id: KHtVUpTbZHaevdn4EABmQEe
5+
cre:
6+
id: CRE-2025-0139
7+
severity: 2
8+
title: Container exited 139 due to segmentation fault (SIGSEGV)
9+
category: runtime-problem
10+
author: CRE Community
11+
description: |
12+
Exit code 139 indicates SIGSEGV (invalid memory access) in native/runtime code.
13+
Frequently caused by unsafe pointer operations, ABI/library mismatches, or native extensions.
14+
cause: |
15+
- Null dereference or out-of-bounds access in C/C++/Rust unsafe blocks.
16+
- Incompatible glibc/musl or driver/library versions.
17+
- Faulty JNI/ctypes/native extension code paths.
18+
impact: |
19+
- Hard crash; requests being processed may be dropped.
20+
- Repeated crashes if the segfault occurs deterministically at startup.
21+
tags:
22+
- k8s
23+
- exit-code
24+
- segfault
25+
- native
26+
- reliability
27+
mitigation: |
28+
- Enable core dumps and symbol files; capture stack traces.
29+
- Pin compatible base image/libc; verify ABI expectations.
30+
- Use ASAN/UBSAN builds; bisect recent native/library changes.
31+
references:
32+
- "https://man7.org/linux/man-pages/man7/signal.7.html"
33+
applications:
34+
- name: kubernetes
35+
version: ">=1.16"
36+
impactScore: 7
37+
mitigationScore: 2
38+
reports: 5
39+
rule:
40+
set:
41+
event:
42+
source: cre.log.k8s
43+
match:
44+
- regex: "^[^\\t]+\\t[^\\t/]+/[^\\t]+\\t[^\\t]+\\t[^\\t]*\\t139$"

rules/cre-2025-0139/test.log

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
2025-08-27T13:32:40Z cre-demo/segv-139 segv Error 139

rules/tags/tags.yaml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -845,6 +845,24 @@ tags:
845845
- name: cluster-scaling
846846
displayName: Cluster Scaling
847847
description: Problems related to Kubernetes cluster scaling operations and capacity management
848+
- name: exit-code
849+
displayName: Exit Code
850+
description: Problems identified by specific process/container exit codes (e.g., 137, 127, 134, 139).
851+
- name: entrypoint
852+
displayName: Entrypoint
853+
description: Failures caused by invalid or missing container ENTRYPOINT/CMD definitions.
854+
- name: command
855+
displayName: Command
856+
description: Problems caused by invalid commands or arguments at startup (e.g., not found, bad path, non-executable).
857+
- name: sigabrt
858+
displayName: SIGABRT
859+
description: Crashes where a process aborts with SIGABRT (exit 134), often due to assertion failures or allocator checks.
860+
- name: native
861+
displayName: Native
862+
description: Issues in native code paths (C/C++/Rust, libc/ABI), including crashes and memory faults.
863+
- name: reliability
864+
displayName: Reliability
865+
description: Unstable behavior such as unexpected restarts, crash loops, or intermittent failures affecting service reliability.
848866
- name: autogpt
849867
displayName: AutoGPT
850868
description: Problems related to AutoGPT autonomous AI agent framework

0 commit comments

Comments
 (0)