Skip to content

Commit 85ad824

Browse files
committed
Add kubestack troubleshooting
1 parent d79d337 commit 85ad824

File tree

2 files changed

+121
-0
lines changed

2 files changed

+121
-0
lines changed
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
---
2+
navigation_title: Insufficient resources with Kube-Stack chart
3+
description: Learn what to do when the Kube-Stack chart is deployed with insufficient resources.
4+
applies_to:
5+
stack:
6+
serverless:
7+
observability:
8+
product:
9+
edot_collector: ga
10+
products:
11+
- id: cloud-serverless
12+
- id: observability
13+
- id: edot-collector
14+
---
15+
16+
# Insufficient resources issue with Kube-Stack Helm Chart
17+
18+
The OpenTelemetry Kube-Stack Helm Chart deploys multiple EDOT collectors with varying configurations based on the selected architecture and deployment mode. On larger clusters, the default Kubernetes resource limits might be insufficient.
19+
20+
## Symptoms
21+
22+
These symptoms are common when the Kube-Stack chart is deployed with insufficient resources:
23+
24+
- Collector Pods in a CrashLoopBackOff/OOMKilled state.
25+
- Cluster or Daemon pods are unable to export data to the Gateway collector due being OOMKilled (high memory usage).
26+
- Pods have logs similar to: `error internal/queue_sender.go:128 Exporting failed. Dropping data.`
27+
28+
## Resolution
29+
30+
Follow these steps to resolve the issue.
31+
32+
:::::{stepper}
33+
34+
::::{step} Check for OOMKilled Pods
35+
Run the following command to check the Pods:
36+
37+
```bash
38+
kubectl get pods -n opentelemetry-operator-system
39+
```
40+
41+
Look for any Pods in the `OOMKilled` state:
42+
43+
```
44+
NAME READY STATUS RESTARTS AGE
45+
opentelemetry-kube-stack-cluster-stats-collector-7cd88c77drvj76 1/1 Running 0 49s
46+
opentelemetry-kube-stack-daemon-collector-pn4qj 1/1 Running 0 47s
47+
opentelemetry-kube-stack-gateway-collector-85795c7965-wxqls 0/1 OOMKilled 3 (34s ago) 49s
48+
opentelemetry-kube-stack-gateway-collector-8cfdb59df-lgpbr 0/1 OOMKilled 3 (30s ago) 49s
49+
opentelemetry-kube-stack-gateway-collector-8cfdb59df-s7plz 0/1 CrashLoopBackOff 2 (17s ago) 34s
50+
opentelemetry-kube-stack-opentelemetry-operator-77d46bc4dbv2h6k 2/2 Running 0 3m14s
51+
```
52+
::::
53+
54+
::::{step} Verify the Pod last status
55+
56+
Run the following command to verify the last status of the Pod:
57+
58+
```bash
59+
kubectl describe pod -n opentelemetry-operator-system opentelemetry-kube-stack-gateway-collector-85795c7965-wxqls
60+
61+
State: Waiting
62+
Reason: CrashLoopBackOff
63+
Last State: Terminated
64+
Reason: OOMKilled
65+
Exit Code: 137
66+
```
67+
::::
68+
69+
::::{step} Increase resource limits
70+
71+
Edit the values.yaml file used to deploy the corresponding Helm release. For the Gateway collector, ensure horitzontal Pod autoscaling is turned on. The Gateway collector configuration should be similar to this:
72+
73+
```yaml
74+
gateway:
75+
fullnameOverride: "opentelemetry-kube-stack-gateway"
76+
suffix: gateway
77+
replicas: 2
78+
autoscaler:
79+
minReplicas: 2 # Start with at least 2 replicas for better availability.
80+
maxReplicas: 5 # Allow more scale-out if needed.
81+
targetCPUUtilization: 70 # Scale when CPU usage exceeds 70%.
82+
targetMemoryUtilization: 75 # Scale when memory usage exceeds 75%.
83+
```
84+
85+
If the autoscaler configuration is already available, or another Collector type is running out of memory, increase the resource limits in the corresponding Collector configuration section:
86+
87+
```yaml
88+
gateway:
89+
fullnameOverride: "opentelemetry-kube-stack-gateway"
90+
...
91+
resources:
92+
limits:
93+
cpu: 500m
94+
memory: 20Mi
95+
requests:
96+
cpu: 100m
97+
memory: 10Mi
98+
```
99+
100+
Make sure to update the resource limits within the correct Collector type section. Available types are: `gateway`, `daemon`, `cluster`, and `opentelemetry-operator`.
101+
::::
102+
103+
::::{step} Update the Helm release
104+
105+
Run the following command to update the Helm release:
106+
107+
```bash
108+
$ helm upgrade opentelemetry-kube-stack open-telemetry/opentelemetry-kube-stack --namespace opentelemetry-operator-system --values values.yaml --version '0.6.3'
109+
```
110+
111+
:::{note}
112+
The hard memory limit should be around 2GB.
113+
:::
114+
::::
115+
:::::
116+
117+
## Resources
118+
119+
* [Elastic Kube-stack Helm chart](https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-kube-stack)
120+
* [Elastic stack Kubernetes Helm charts](https://github.com/elastic/helm-charts)

troubleshoot/ingest/opentelemetry/toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ toc:
55
- file: edot-collector/index.md
66
children:
77
- file: edot-collector/collector-oomkilled.md
8+
- file: edot-collector/insufficient-resources-kubestack.md
89
- file: edot-collector/metadata.md
910
- file: edot-collector/enable-debug-logging.md
1011
- file: edot-collector/collector-not-starting.md

0 commit comments

Comments
 (0)