Skip to content

Commit 0088c33

Browse files
committed
Adding EC troubleshooting page
1 parent da5dff0 commit 0088c33

File tree

3 files changed

+112
-8
lines changed

3 files changed

+112
-8
lines changed
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
import SupportBundleIntro from "../partials/support-bundles/_ec-support-bundle-intro.mdx"
2+
import EmbeddedClusterSupportBundle from "../partials/support-bundles/_generate-bundle-ec.mdx"
3+
4+
# Troubleshooting Embedded Cluster
5+
6+
This topic provides information about troubleshooting Replicated Embedded Cluster.
7+
8+
## Troubleshoot with Support Bundles
9+
10+
<SupportBundleIntro/>
11+
12+
<EmbeddedClusterSupportBundle/>
13+
14+
## About the Custom Resources Used By Embedded Cluster
15+
16+
The table below describes the custom resources used inside Embedded Cluster. It includes information about the owner of the resource (Replicated or k0s), the namespace where the resource is installed, and a description of how the resource is used. You can use this information to help you troubleshoot Embedded Cluster installations .
17+
18+
<table>
19+
<tr>
20+
<th>Resource Name</th>
21+
<th>Owner</th>
22+
<th>Namespace</th>
23+
<th>Description</th>
24+
</tr>
25+
<tr>
26+
<td>ClusterConfig</td>
27+
<td>k0s</td>
28+
<td>kube-system</td>
29+
<td>
30+
<p>The `ClusterConfig` object contains the ingested k0s config from `/etc/k0s/k0s.yaml`</p>
31+
<p>This ingestion happens at k0s daemon startup on controller nodes.</p>
32+
<p>It can be dynamically updated, and the k0s config reconciliation process will apply any changes to the cluster except from spec.api and spec.storage</p>
33+
<p>The embedded cluster operator reconciles helm chart updates against the `ClusterConfig` object to initiate helm chart upgrades via the k0s helm reconciler.</p>
34+
</td>
35+
</tr>
36+
<tr>
37+
<td>Chart</td>
38+
<td>k0s</td>
39+
<td>kube-system</td>
40+
<td>
41+
<p>The `Chart` object contains the spec, values, and tracking information for helm charts installed by the k0s helm reconciler; they can be created, deleted and updated. Deleting a Chart object will uninstall the related helm chart from the cluster, however if the helm chart configuration is still present in the `ClusterConfig` the k0s reconciliation process will recreate and reinstall it.</p>
42+
<p>`Chart` Objects are managed by the k0s Helm reconciler. The API / schema for these resources is not documented.</p>
43+
<p>`Chart` Objects are monitored by the Embedded Cluster Operator in order to track and surface helm installation processes and errors.</p>
44+
</td>
45+
</tr>
46+
<tr>
47+
<td>Plan</td>
48+
<td>k0s</td>
49+
<td>Cluster-scoped</td>
50+
<td>
51+
<p>`Plan` objects are used to configure the k0s autopilot operator, the autopilot operator controls cluster version upgrades through distributing and installing new k0s binaries and air gap bundles.</p>
52+
<p>The `Plan` resource is created by the Embedded Cluster Operator using details from the Installation object</p>
53+
</td>
54+
</tr>
55+
<tr>
56+
<td>Installation</td>
57+
<td>Replicated</td>
58+
<td>Cluster-scoped</td>
59+
<td>
60+
<p>The `Installation` object is used by the Embedded Cluster Operator to both initiate and track cluster and helm chart upgrades. they are created by KOTS, and are marked as Obsolete when superseded by a newer `Installation` object</p>
61+
<p>The `Installation` object can contain errors surfaced from the Plan and Chart resources.</p>
62+
<p>For a list of possible `Installation` statuses, see the [`installation_types.go`](https://github.com/replicatedhq/embedded-cluster-operator/blob/e4fbb42919ad3b58cdc563dca77471cf76099393/api/v1beta1/installation_types.go#L24) file in the `embedded-cluster-operator` GitHub repository.</p>
63+
</td>
64+
</tr>
65+
</table>
66+
67+
## View Logs
68+
69+
You can view logs for both k0s and Embedded Cluster to help troubleshoot issues.
70+
71+
### k0s Logs
72+
73+
```
74+
journalctl -u k0scontroller
75+
```
76+
77+
### Embedded Cluster Logs
78+
79+
`/var/lib/embedded-cluster/logs`
80+
81+
## Troubleshoot Errors
82+
83+
### Installation failure when NVIDIA GPU Operator is included as a Helm extension
84+
85+
#### Symptom
86+
87+
A release that includes that includes the NVIDIA GPU Operator as a Helm extensions fails to install.
88+
89+
#### Cause
90+
91+
If there are any containerd services on the host, the NVIDIA GPU Operator will generate an invalid containerd config, causing the installation to fail.
92+
93+
#### Solution
94+
95+
Remove any existing containerd services that are running on the host (such as those deployed by Docker) before attempting to install the release with Embedded Cluster.
96+
97+
For more information, see [NVIDIA GPU Operator](/vendor/embedded-using#nvidia-gpu-operator) in _Using Embedded Cluster_.
98+
99+
### Calico networking issues
100+
101+
#### Symptom
102+
103+
Possible symptoms include:
104+
105+
Pod stuck in CrashLoopBackOff state with failed health checks
106+
107+
Pod log contains i/o timeout
108+
109+
#### Cause
110+
111+
#### Solution

docs/vendor/embedded-using.mdx

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
11
import UpdateOverview from "../partials/embedded-cluster/_update-overview.mdx"
2-
import SupportBundleIntro from "../partials/support-bundles/_ec-support-bundle-intro.mdx"
3-
import EmbeddedClusterSupportBundle from "../partials/support-bundles/_generate-bundle-ec.mdx"
42
import EcConfig from "../partials/embedded-cluster/_ec-config.mdx"
53

64
# Using Embedded Cluster
@@ -268,9 +266,3 @@ When the containerd options are configured as shown above, the NVIDIA GPU Operat
268266
:::note
269267
If you include the NVIDIA GPU Operator as a Helm extension, remove any existing containerd services that are running on the host (such as those deployed by Docker) before attempting to install the release with Embedded Cluster. If there are any containerd services on the host, the NVIDIA GPU Operator will generate an invalid containerd config, causing the installation to fail.
270268
:::
271-
272-
## Troubleshoot with Support Bundles
273-
274-
<SupportBundleIntro/>
275-
276-
<EmbeddedClusterSupportBundle/>

sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,7 @@ const sidebars = {
247247
'enterprise/updating-embedded',
248248
'enterprise/embedded-tls-certs',
249249
'vendor/embedded-disaster-recovery',
250+
'vendor/embedded-troubleshooting',
250251
],
251252
},
252253
{

0 commit comments

Comments
 (0)