|
| 1 | +import SupportBundleIntro from "../partials/support-bundles/_ec-support-bundle-intro.mdx" |
| 2 | +import EmbeddedClusterSupportBundle from "../partials/support-bundles/_generate-bundle-ec.mdx" |
| 3 | + |
| 4 | +# Troubleshooting Embedded Cluster |
| 5 | + |
| 6 | +This topic provides information about troubleshooting Replicated Embedded Cluster. |
| 7 | + |
| 8 | +## Troubleshoot with Support Bundles |
| 9 | + |
| 10 | +<SupportBundleIntro/> |
| 11 | + |
| 12 | +<EmbeddedClusterSupportBundle/> |
| 13 | + |
| 14 | +## About the Custom Resources Used By Embedded Cluster |
| 15 | + |
| 16 | +The table below describes the custom resources used inside Embedded Cluster. It includes information about the owner of the resource (Replicated or k0s), the namespace where the resource is installed, and a description of how the resource is used. You can use this information to help you troubleshoot Embedded Cluster installations . |
| 17 | + |
| 18 | +<table> |
| 19 | + <tr> |
| 20 | + <th>Resource Name</th> |
| 21 | + <th>Owner</th> |
| 22 | + <th>Namespace</th> |
| 23 | + <th>Description</th> |
| 24 | + </tr> |
| 25 | + <tr> |
| 26 | + <td>ClusterConfig</td> |
| 27 | + <td>k0s</td> |
| 28 | + <td>kube-system</td> |
| 29 | + <td> |
| 30 | + <p>The `ClusterConfig` object contains the ingested k0s config from `/etc/k0s/k0s.yaml`</p> |
| 31 | + <p>This ingestion happens at k0s daemon startup on controller nodes.</p> |
| 32 | + <p>It can be dynamically updated, and the k0s config reconciliation process will apply any changes to the cluster except from spec.api and spec.storage</p> |
| 33 | + <p>The embedded cluster operator reconciles helm chart updates against the `ClusterConfig` object to initiate helm chart upgrades via the k0s helm reconciler.</p> |
| 34 | + </td> |
| 35 | + </tr> |
| 36 | + <tr> |
| 37 | + <td>Chart</td> |
| 38 | + <td>k0s</td> |
| 39 | + <td>kube-system</td> |
| 40 | + <td> |
| 41 | + <p>The `Chart` object contains the spec, values, and tracking information for helm charts installed by the k0s helm reconciler; they can be created, deleted and updated. Deleting a Chart object will uninstall the related helm chart from the cluster, however if the helm chart configuration is still present in the `ClusterConfig` the k0s reconciliation process will recreate and reinstall it.</p> |
| 42 | + <p>`Chart` Objects are managed by the k0s Helm reconciler. The API / schema for these resources is not documented.</p> |
| 43 | + <p>`Chart` Objects are monitored by the Embedded Cluster Operator in order to track and surface helm installation processes and errors.</p> |
| 44 | + </td> |
| 45 | + </tr> |
| 46 | + <tr> |
| 47 | + <td>Plan</td> |
| 48 | + <td>k0s</td> |
| 49 | + <td>Cluster-scoped</td> |
| 50 | + <td> |
| 51 | + <p>`Plan` objects are used to configure the k0s autopilot operator, the autopilot operator controls cluster version upgrades through distributing and installing new k0s binaries and air gap bundles.</p> |
| 52 | + <p>The `Plan` resource is created by the Embedded Cluster Operator using details from the Installation object</p> |
| 53 | + </td> |
| 54 | + </tr> |
| 55 | + <tr> |
| 56 | + <td>Installation</td> |
| 57 | + <td>Replicated</td> |
| 58 | + <td>Cluster-scoped</td> |
| 59 | + <td> |
| 60 | + <p>The `Installation` object is used by the Embedded Cluster Operator to both initiate and track cluster and helm chart upgrades. they are created by KOTS, and are marked as Obsolete when superseded by a newer `Installation` object</p> |
| 61 | + <p>The `Installation` object can contain errors surfaced from the Plan and Chart resources.</p> |
| 62 | + <p>For a list of possible `Installation` statuses, see the [`installation_types.go`](https://github.com/replicatedhq/embedded-cluster-operator/blob/e4fbb42919ad3b58cdc563dca77471cf76099393/api/v1beta1/installation_types.go#L24) file in the `embedded-cluster-operator` GitHub repository.</p> |
| 63 | + </td> |
| 64 | + </tr> |
| 65 | +</table> |
| 66 | + |
| 67 | +## View Logs |
| 68 | + |
| 69 | +You can view logs for both k0s and Embedded Cluster to help troubleshoot issues. |
| 70 | + |
| 71 | +### k0s Logs |
| 72 | + |
| 73 | +``` |
| 74 | +journalctl -u k0scontroller |
| 75 | +``` |
| 76 | + |
| 77 | +### Embedded Cluster Logs |
| 78 | + |
| 79 | +`/var/lib/embedded-cluster/logs` |
| 80 | + |
| 81 | +## Troubleshoot Errors |
| 82 | + |
| 83 | +### Installation failure when NVIDIA GPU Operator is included as a Helm extension |
| 84 | + |
| 85 | +#### Symptom |
| 86 | + |
| 87 | +A release that includes that includes the NVIDIA GPU Operator as a Helm extensions fails to install. |
| 88 | + |
| 89 | +#### Cause |
| 90 | + |
| 91 | +If there are any containerd services on the host, the NVIDIA GPU Operator will generate an invalid containerd config, causing the installation to fail. |
| 92 | + |
| 93 | +#### Solution |
| 94 | + |
| 95 | +Remove any existing containerd services that are running on the host (such as those deployed by Docker) before attempting to install the release with Embedded Cluster. |
| 96 | + |
| 97 | +For more information, see [NVIDIA GPU Operator](/vendor/embedded-using#nvidia-gpu-operator) in _Using Embedded Cluster_. |
| 98 | + |
| 99 | +### Calico networking issues |
| 100 | + |
| 101 | +#### Symptom |
| 102 | + |
| 103 | +Possible symptoms include: |
| 104 | + |
| 105 | +Pod stuck in CrashLoopBackOff state with failed health checks |
| 106 | + |
| 107 | +Pod log contains i/o timeout |
| 108 | + |
| 109 | +#### Cause |
| 110 | + |
| 111 | +#### Solution |
0 commit comments