Skip to content

Commit ea12f65

Browse files
eedugonbmorelli25kilfoylegizas
authored andcommitted
Adding troubleshooting section for Elastic Agent on Kubernetes and Kustomize (#1409)
Co-authored-by: Brandon Morelli <[email protected]> Co-authored-by: David Kilfoyle <[email protected]> Co-authored-by: Andrew Gizas <[email protected]> (cherry picked from commit 9eeb6a8)
1 parent b85f365 commit ea12f65

File tree

1 file changed

+133
-0
lines changed

1 file changed

+133
-0
lines changed

docs/en/ingest-management/troubleshooting/troubleshooting.asciidoc

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ Find troubleshooting information for {fleet}, {fleet-server}, and {agent} in the
6060
* <<fleet-server-integration-removed>>
6161
* <<agent-oom-k8s>>
6262
* <<agent-sudo-error>>
63+
* <<agent-kubernetes-kustomize>>
6364

6465

6566
[discrete]
@@ -830,3 +831,135 @@ Error: error loading agent config: error loading raw config: fail to read config
830831

831832
To resolve this, either install {agent} without the `--unprivileged` flag so that it has administrative access, or run the {agent} commands without the `sudo` prefix.
832833

834+
[discrete]
835+
[[agent-kubernetes-kustomize]]
836+
== Troubleshoot {agent} installation on Kubernetes, with Kustomize
837+
838+
Potential issues during {agent} installation on Kubernetes can be categorized into two main areas:
839+
840+
. <<agent-kustomize-manifest>>.
841+
. <<agent-kustomize-after>>.
842+
843+
[discrete]
844+
[[agent-kustomize-manifest]]
845+
=== Problems related to the creation of objects within the manifest
846+
847+
When troubleshooting installations performed with https://github.com/kubernetes-sigs/kustomize[Kustomize], it's good practice to inspect the output of the rendered manifest. To do this, take the installation command provided by Kibana Onboarding and replace the final part, `| kubectl apply -f-`, with a redirection to a local file. This allows for easier analysis of the rendered output.
848+
849+
For example, the following command, originally provided by {kib} for an {agent} Standalone installation, has been modified to redirect the output for troubleshooting purposes:
850+
851+
[source,sh]
852+
----
853+
kubectl kustomize https://github.com/elastic/elastic-agent/deploy/kubernetes/elastic-agent-kustomize/default/elastic-agent-standalone\?ref\=v8.15.3 | sed -e 's/JUFQSV9LRVkl/ZDAyNnZaSUJ3eWIwSUlCT0duRGs6Q1JfYmJoVFRUQktoN2dXTkd0FNMtdw==/g' -e "s/%ES_HOST%/https:\/\/7a912e8674a34086eacd0e3d615e6048.us-west2.gcp.elastic-cloud.com:443/g" -e "s/%ONBOARDING_ID%/db687358-2c1f-4ec9-86e0-8f1baa4912ed/g" -e "s/\(docker.elastic.co\/beats\/elastic-agent:\).*$/\18.15.3/g" -e "/{CA_TRUSTED}/c\ " > elastic_agent_installation_complete_manifest.yaml
854+
----
855+
856+
The previous command generates a local file named `elastic_agent_installation_complete_manifest.yaml`, which you can use for further analysis. It contains the complete set of resources required for the {agent} installation, including:
857+
858+
* RBAC objects (`ServiceAccounts`, `Roles`, etc.)
859+
860+
* `ConfigMaps` and `Secrets` for {agent} configuration
861+
862+
* {agent} Standalone deployed as a `DaemonSet`
863+
864+
* https://github.com/kubernetes/kube-state-metrics[Kube-state-metrics] deployed as a `Deployment`
865+
866+
The content of this file is equivalent to what you'd obtain by following the <<running-on-kubernetes-standalone>> steps, with the exception that `kube-state-metrics` is not included in the standalone method.
867+
868+
**Possible issues**
869+
870+
* If your user doesn't have *cluster-admin* privileges, the RBAC resources creation might fail.
871+
872+
* Some Kubernetes security mechanisms (like https://kubernetes.io/docs/concepts/security/pod-security-standards/[Pod Security Standards]) could cause part of the manifest to be rejected, as `hostNetwork` access and `hostPath` volumes are required.
873+
874+
* If you already have an installation of `kube-state-metrics`, it could cause part of the manifest installation to fail or to update your existing resources without notice.
875+
876+
[discrete]
877+
[[agent-kustomize-after]]
878+
=== Failures occurring within specific components after installation
879+
880+
If the installation is correct and all resources are deployed, but data is not flowing as expected (for example, you don't see any data on the *[Metrics Kubernetes] Cluster Overview* dashboard), check the following items:
881+
882+
. Check resources status and ensure they are all in a `Running` state:
883+
+
884+
[source,sh]
885+
----
886+
kubectl get pods -n kube-system | grep elastic
887+
kubectl get pods -n kube-system | grep kube-state-metrics
888+
----
889+
+
890+
[NOTE]
891+
====
892+
The default configuration assumes that both `kube-state-metrics` and the {agent} `DaemonSet` are deployed in the **same namespace** for communication purposes. If you change the namespace of any of the components, the agent configuration will need further policy updates.
893+
====
894+
895+
. Describe the Pods if they are in a `Pending` state:
896+
+
897+
[source,sh]
898+
----
899+
kubectl describe -n kube-system <name_of_elastic_agent_pod>
900+
----
901+
902+
. Check the logs of elastic-agents and kube-state-metrics, and look for errors or warnings:
903+
+
904+
[source,sh]
905+
----
906+
kubectl logs -n kube-system <name_of_elastic_agent_pod>
907+
kubectl logs -n kube-system <name_of_elastic_agent_pod> | grep -i error
908+
kubectl logs -n kube-system <name_of_elastic_agent_pod> | grep -i warn
909+
----
910+
+
911+
[source,sh]
912+
----
913+
kubectl logs -n kube-system <name_of_kube-state-metrics_pod>
914+
----
915+
916+
**Possible issues**
917+
918+
* Connectivity, authorization, or authentication issues when connecting to {es}:
919+
+
920+
Ensure the API Key and {es} destination endpoint used during the installation is correct and is reachable from within the Pods.
921+
+
922+
In an already installed system, the API Key is stored in a `Secret` named `elastic-agent-creds-<hash>`, and the endpoint is configured in the `ConfigMap` `elastic-agent-configs-<hash>`.
923+
924+
* Missing cluster-level metrics (provided by `kube-state-metrics`):
925+
+
926+
As described in <<running-on-kubernetes-standalone>>, the {agent} Pod acting as `leader` is responsible for retrieving cluster-level metrics from `kube-state-metrics` and delivering them to {ref}/data-streams.html[data streams] prefixed as `metrics-kubernetes.state_<resource>`. In order to troubleshoot a situation where these metrics are not appearing:
927+
+
928+
. Determine which Pod owns the <<kubernetes_leaderelection-provider, leadership>> `lease` in the cluster, with:
929+
+
930+
[source,sh]
931+
----
932+
kubectl get lease -n kube-system elastic-agent-cluster-leader
933+
----
934+
+
935+
. Check the logs of that Pod to see if there are errors when connecting to `kube-state-metrics` and if the `state_*` metrics are being sent to {es}.
936+
+
937+
One way to check if `state_*` metrics are being delivered to {es} is to inspect log lines with the `"Non-zero metrics in the last 30s"` message and check the values of the `state_*` metrics within the line, with something like:
938+
+
939+
[source,sh]
940+
----
941+
kubectl logs -n kube-system elastic-agent-xxxx | grep "Non-zero metrics" | grep "state_"
942+
----
943+
+
944+
If the previous command returns `"state_pod":{"events":213,"success":213}` or similar for all `state_*` metrics, it means the metrics are being delivered.
945+
+
946+
. As a last resort, if you believe none of the Pods is acting as a leader, you can try deleting the `lease` to generate a new one:
947+
+
948+
[source,sh]
949+
----
950+
kubectl delete lease -n kube-system elastic-agent-cluster-leader
951+
# wait a few seconds and check for the lease again
952+
kubectl get lease -n kube-system elastic-agent-cluster-leader
953+
----
954+
955+
* Performance problems:
956+
+
957+
Monitor the CPU and Memory usage of the agents Pods and adjust the manifest requests and limits as needed. Refer to <<scaling-on-kubernetes>> for more details about the needed resources.
958+
959+
Extra resources for {agent} on Kubernetes troubleshooting and information:
960+
961+
* <<agent-oom-k8s>>.
962+
963+
* https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes/elastic-agent-kustomize/default[{agent} Kustomize Templates] documentation and resources.
964+
965+
* Other examples and manifests to deploy https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes[{agent} on Kubernetes].

0 commit comments

Comments
 (0)