You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 2, 2025. It is now read-only.
Copy file name to clipboardExpand all lines: gdi/opentelemetry/collector-kubernetes/k8s-infrastructure-tutorial/about-k8s-tutorial.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ Tutorial: Monitor your Kubernetes environment in Splunk Observability Cloud
16
16
k8s-monitor-with-navigators
17
17
k8s-activate-detector
18
18
19
-
Deploy the Splunk Distribution of OpenTelemetry Collector in a Kubernetes cluster and start monitoring your Kubernetes platform using Splunk Observability Cloud.
19
+
Deploy the Splunk Distribution of the OpenTelemetry Collector in a Kubernetes cluster and start monitoring your Kubernetes platform using Splunk Observability Cloud.
Even if you didn't provide enough resources for the Collector containers, under normal circumstances the Collector doesn't run out of memory (OOM). This can only happen if the Collector is heavily throttled by the backend and exporter sending queue growing faster than collector can control memory utilization. In that case you should see ``429`` errors for metrics and traces or ``503`` errors for logs.
16
-
17
-
For example:
18
-
19
-
.. code-block::
20
-
21
-
2021-11-12T00:22:32.172Z info exporterhelper/queued_retry.go:325 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "sapm", "error": "server responded with 429", "interval": "4.4850027s"}
22
-
2021-11-12T00:22:38.087Z error exporterhelper/queued_retry.go:190 Dropping data because sending_queue is full. Try increasing queue_size. {"kind": "exporter", "name": "sapm", "dropped_items": 1348}
23
-
24
-
If you can't fix throttling by bumping limits on the backend or reducing amount of data sent through the Collector, you can avoid OOMs by reducing the sending queue of the failing exporter. For example, you can reduce ``sending_queue`` for the ``sapm`` exporter:
25
-
26
-
.. code-block:: yaml
27
-
28
-
agent:
29
-
config:
30
-
exporters:
31
-
sapm:
32
-
sending_queue:
33
-
queue_size: 512
34
-
35
-
You can apply a similar configuration to any other failing exporter.
Kubernetes requires you to install a container runtime on each node in the cluster so that pods can run there. The Splunk Distribution of the Collector for Kubernetes supports container runtimes such as containerd, CRI-O, Docker, and Mirantis Kubernetes Engine (formerly Docker Enterprise/UCP).
41
19
@@ -52,7 +30,7 @@ For more information about runtimes, see :new-page:`Container runtime <https://k
Use the Kubelet Summary API to verify container, pod, and node stats. The Kubelet provides the Summary API to discover and retrieve per-node summarized stats available through the ``/stats`` endpoint.
83
61
@@ -88,7 +66,7 @@ All of the stats shown in these examples should be present unless otherwise note
.. note:: Managed Kubernetes services might use a modified container runtime, and the service provider might have applied custom patches or bug fixes that are not present within an unmodified container runtime.
346
324
347
325
This section describes known incompatibilities and container runtime issues.
When using Kubernetes 1.21.0 to 1.21.11 with containerd, memory and network stats or metrics might be missing. The following is a list of affected metrics:
353
331
@@ -367,7 +345,7 @@ Try one of the following workarounds to resolve the issue:
367
345
- Upgrade containerd to version 1.4.x or 1.5.x.
368
346
369
347
containerd 1.4.0 to 1.4.12 with Kubernetes 1.22.0 to 1.22.8
When using Kubernetes 1.22.0 to 1.22.8 with containerd 1.4.0 to 1.4.12, memory and network stats or metrics can be missing. The following is a list of affected metrics:
373
351
@@ -388,7 +366,7 @@ Try one of the following workarounds to resolve the issue:
388
366
- Upgrade containerd to at least version 1.4.13 or 1.5.0 to fix the missing pod memory metrics.
:description: Learn how to deploy the Splunk Distribution of the OpenTelemetry Collector on a Kubernetes cluster, view your cluster data, and create a detector to issue alerts.
After deploying the Splunk Distribution of the OpenTelemetry Collector for Kubernetes Chart version 0.87.0 or higher as either a new install or upgrade the following pod and node metrics are not being collected:
The :ref:`kubelet-stats-receiver` collects k8s.(pod or node) metrics from the Kubernetes endpoint ``/stats/summary``. As of version 0.87.0 of the Splunk OTel Collector the kubelet certificate is verified during this process to confirm it's valid. If you are using a self signed or invalid certificate the Kubelet stats receiver cannot collect the metrics.
70
+
71
+
You have two alternatives to resolve this error:
72
+
73
+
1. Add valid a certificate to your Kubernetes cluster. See how at :ref:`otel-kubernetes-config`. After updating the ``values.yaml`` file use the Helm upgrade command to upgrade your Collector deployment.
74
+
75
+
2. Disable certificate verification in the OTel agent Kubelet Stats receiver by setting ``insecure_skip_verify: true`` for the Kubelet stats receiver in the agent.config section of the values.yaml.
76
+
77
+
For example, use the configuration below to disable certificate verification:
78
+
79
+
.. code-block::
80
+
81
+
agent:
82
+
config:
83
+
receivers:
84
+
kubeletstats:
85
+
insecure_skip_verify: true
86
+
87
+
.. caution:: Keep in mind your security requirements before disabling certificate verification.
Set the resources allocated to your Collector instance based on the amount of data you expecte to handle. For more information, see :ref:`otel-sizing`.
22
+
23
+
Use the following configuration to bump resource limits for the agent:
24
+
25
+
.. code-block:: yaml
26
+
27
+
agent:
28
+
resources:
29
+
limits:
30
+
cpu: 500m
31
+
memory: 1Gi
32
+
33
+
Set the resources allocated to your cluster receiver deployment based on the cluster size. For example, for a cluster with 100 nodes alllocate these resources:
Even if you didn't provide enough resources for the Collector containers, under normal circumstances the Collector doesn't run out of memory (OOM). This can only happen if the Collector is heavily throttled by the backend and exporter sending queue growing faster than collector can control memory utilization. In that case you should see ``429`` errors for metrics and traces or ``503`` errors for logs.
48
+
49
+
For example:
50
+
51
+
.. code-block::
52
+
53
+
2021-11-12T00:22:32.172Z info exporterhelper/queued_retry.go:325 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "sapm", "error": "server responded with 429", "interval": "4.4850027s"}
54
+
2021-11-12T00:22:38.087Z error exporterhelper/queued_retry.go:190 Dropping data because sending_queue is full. Try increasing queue_size. {"kind": "exporter", "name": "sapm", "dropped_items": 1348}
55
+
56
+
If you can't fix throttling by bumping limits on the backend or reducing amount of data sent through the Collector, you can avoid OOMs by reducing the sending queue of the failing exporter. For example, you can reduce ``sending_queue`` for the ``sapm`` exporter:
57
+
58
+
.. code-block:: yaml
59
+
60
+
agent:
61
+
config:
62
+
exporters:
63
+
sapm:
64
+
sending_queue:
65
+
queue_size: 512
66
+
67
+
You can apply a similar configuration to any other failing exporter.
0 commit comments