Skip to content

Commit a24665d

Browse files
authored
Merge pull request #6960 from oscr/quick-start-inotify
📖 Add troubleshooting advice when running Quick Start with CAPD
2 parents 646cdc4 + eabfd68 commit a24665d

File tree

4 files changed

+69
-4
lines changed

4 files changed

+69
-4
lines changed

docs/book/src/introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Started by the Kubernetes Special Interest Group (SIG) [Cluster Lifecycle](https
66

77
## Getting started
88

9-
* [Quick start](./user/quick-start.md)
9+
* [Quick Start](./user/quick-start.md)
1010
* [Concepts](./user/concepts.md)
1111
* [Developer guide](./developer/guide.md)
1212
* [Contributing](./CONTRIBUTING.md)

docs/book/src/tasks/certs/using-custom-certificates.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Each certificate must be stored in a single secret named one of:
1212
| *[cluster name]***-sa** | Key Pair | openssl genrsa -out tls.key 2048 && openssl rsa -in tls.key -pubout -out tls.crt |
1313

1414

15-
<aside class="note warn">
15+
<aside class="note warning">
1616

1717
<h1>CA Key Age</h1>
1818

docs/book/src/user/quick-start.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,11 @@ a target [management cluster] on the selected [infrastructure provider].
4747

4848
**Minimum [kind] supported version**: v0.14.0
4949

50-
Note for macOS users: you may need to [increase the memory available](https://docs.docker.com/docker-for-mac/#resources) for containers (recommend 6Gb for CAPD).
50+
**Help with common issues can be found in the [Troubleshooting Guide](./troubleshooting.md).**
51+
52+
Note for macOS users: you may need to [increase the memory available](https://docs.docker.com/docker-for-mac/#resources) for containers (recommend 6 GB for CAPD).
53+
54+
Note for Linux users: you may need to [increase `ulimit` and `inotify` when using Docker (CAPD)](./troubleshooting.md#cluster-api-with-docker----too-many-open-files).
5155

5256
</aside>
5357

docs/book/src/user/troubleshooting.md

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,66 @@
11
# Troubleshooting
22

3+
## Troubleshooting Quick Start with Docker (CAPD)
4+
5+
<aside class="note warning">
6+
7+
<h1>Warning</h1>
8+
9+
If you've run the Quick Start before ensure that you've [cleaned up](./quick-start.md#clean-up) all resources before trying it again. Check `docker ps` to ensure there are no running containers left before beginning the Quick Start.
10+
11+
</aside>
12+
13+
This guide assumes you've completed the [apply the workload cluster](./quick-start.md#apply-the-workload-cluster) section of the Quick Start using Docker.
14+
15+
When running `clusterctl describe cluster capi-quickstart` to verify the created resources, we expect the output to be similar to this (**note: this is before installing the Calico CNI**).
16+
17+
```shell
18+
NAME READY SEVERITY REASON SINCE MESSAGE
19+
Cluster/capi-quickstart True 46m
20+
├─ClusterInfrastructure - DockerCluster/capi-quickstart-94r9d True 48m
21+
├─ControlPlane - KubeadmControlPlane/capi-quickstart-6487w True 46m
22+
│ └─3 Machines... True 47m See capi-quickstart-6487w-d5lkp, capi-quickstart-6487w-mpmkq, ...
23+
└─Workers
24+
└─MachineDeployment/capi-quickstart-md-0-d6dn6 False Warning WaitingForAvailableMachines 48m Minimum availability requires 3 replicas, current 0 available
25+
└─3 Machines... True 47m See capi-quickstart-md-0-d6dn6-584ff97cb7-kr7bj, capi-quickstart-md-0-d6dn6-584ff97cb7-s6cbf, ...
26+
```
27+
28+
Machines should be started, but Workers are not because Calico isn't installed yet. You should be able to see the containers running with `docker ps --all` and they should not be restarting.
29+
30+
If you notice Machines are failing to start/restarting your output might look similar to this:
31+
32+
```shell
33+
clusterctl describe cluster capi-quickstart
34+
NAME READY SEVERITY REASON SINCE MESSAGE
35+
Cluster/capi-quickstart False Warning ScalingUp 57s Scaling up control plane to 3 replicas (actual 2)
36+
├─ClusterInfrastructure - DockerCluster/capi-quickstart-n5w87 True 110s
37+
├─ControlPlane - KubeadmControlPlane/capi-quickstart-6587k False Warning ScalingUp 57s Scaling up control plane to 3 replicas (actual 2)
38+
│ ├─Machine/capi-quickstart-6587k-fgc6m True 81s
39+
│ └─Machine/capi-quickstart-6587k-xtvnz False Warning BootstrapFailed 52s 1 of 2 completed
40+
└─Workers
41+
└─MachineDeployment/capi-quickstart-md-0-5whtj False Warning WaitingForAvailableMachines 110s Minimum availability requires 3 replicas, current 0 available
42+
└─3 Machines... False Info Bootstrapping 77s See capi-quickstart-md-0-5whtj-5d8c9746c9-f8sw8, capi-quickstart-md-0-5whtj-5d8c9746c9-hzxc2, ...
43+
```
44+
45+
In the example above we can see that the Machine `capi-quickstart-6587k-xtvnz` has failed to start. The reason provided is `BootstrapFailed`.
46+
47+
To investigate why a machine fails to start you can inspect the conditions of the objects using `clusterctl describe --show-conditions all cluster capi-quickstart`. You can get more detailed information about the status of the machines using `kubectl describe machines`.
48+
49+
To inspect the underlying infrastructure - in this case docker containers acting as Machines - you can access the logs using `docker logs <MACHINE-NAME>`. For example:
50+
51+
```shell
52+
docker logs capi-quickstart-6587k-xtvnz
53+
(...)
54+
Failed to create control group inotify object: Too many open files
55+
Failed to allocate manager object: Too many open files
56+
[!!!!!!] Failed to allocate manager object.
57+
Exiting PID 1...
58+
```
59+
60+
To resolve this specific error please read [Cluster API with Docker - "too many open files"](#cluster-api-with-docker----too-many-open-files).
61+
62+
63+
364
## Node bootstrap failures when using CABPK with cloud-init
465

566
Failures during Node bootstrapping can have a lot of different causes. For example, Cluster API resources might be
@@ -52,7 +113,7 @@ provisioning might be stuck:
52113
* Run [docker system df](https://docs.docker.com/engine/reference/commandline/system_df/) to inspect the disk space consumed by Docker resources.
53114
* Run [docker system prune --volumes](https://docs.docker.com/engine/reference/commandline/system_prune/) to prune dangling images, containers, volumes and networks.
54115

55-
116+
56117
## Cluster API with Docker - "too many open files"
57118
When creating many nodes using Cluster API and Docker infrastructure, either by creating large Clusters or a number of small Clusters, the OS may run into inotify limits which prevent new nodes from being provisioned.
58119
If the error `Failed to create inotify object: Too many open files` is present in the logs of the Docker Infrastructure provider this limit is being hit.

0 commit comments

Comments
 (0)