You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
== Troubleshooting the installation program workflow
11
11
12
12
Prior to troubleshooting the installation environment, it is critical to understand the overall flow of the installer-provisioned installation on bare metal. The diagrams below provide a troubleshooting flow with a step-by-step breakdown for the environment.
13
13
14
14
image:flow1.png[Flow-Diagram-1]
15
15
16
-
_Workflow 1 of 4_ illustrates a troubleshooting workflow when the `install-config.yaml` file has errors or the {op-system-first} images are inaccessible. Troubleshooting suggestions can be found at xref:ipi-install-troubleshooting-install-config_{context}[Troubleshooting `install-config.yaml`].
16
+
_Workflow 1 of 4_ illustrates a troubleshooting workflow when the `install-config.yaml` file has errors or the {op-system-first} images are inaccessible. Troubleshooting suggestions can be found at xref:ipi-install-troubleshooting-install-config_ipi-install-troubleshooting[Troubleshooting `install-config.yaml`].
17
17
18
18
image:flow2.png[Flow-Diagram-2]
19
19
20
-
_Workflow 2 of 4_ illustrates a troubleshooting workflow for xref:ipi-install-troubleshooting-bootstrap-vm_{context}[ bootstrap VM issues], xref:ipi-install-troubleshooting-bootstrap-vm-cannot-boot_{context}[ bootstrap VMs that cannot boot up the cluster nodes], and xref:ipi-install-troubleshooting-bootstrap-vm-inspecting-logs_{context}[ inspecting logs]. When installing an {product-title} cluster without the `provisioning` network, this workflow does not apply.
20
+
_Workflow 2 of 4_ illustrates a troubleshooting workflow for xref:ipi-install-troubleshooting-bootstrap-vm_ipi-install-troubleshooting[ bootstrap VM issues], xref:ipi-install-troubleshooting-bootstrap-vm-cannot-boot_ipi-install-troubleshooting[ bootstrap VMs that cannot boot up the cluster nodes], and xref:ipi-install-troubleshooting-bootstrap-vm-inspecting-logs_ipi-install-troubleshooting[ inspecting logs]. When installing an {product-title} cluster without the `provisioning` network, this workflow does not apply.
21
21
22
22
image:flow3.png[Flow-Diagram-3]
23
23
24
-
_Workflow 3 of 4_ illustrates a troubleshooting workflow for xref:ipi-install-troubleshooting-cluster-nodes-will-not-pxe_{context}[ cluster nodes that will not PXE boot]. If installing using RedFish Virtual Media, each node must meet minimum firmware requirements for the installer to deploy the node. See *Firmware requirements for installing with virtual media* in the *Prerequisites* section for additional details.
24
+
_Workflow 3 of 4_ illustrates a troubleshooting workflow for xref:ipi-install-troubleshooting-cluster-nodes-will-not-pxe_ipi-install-troubleshooting[ cluster nodes that will not PXE boot]. If installing using RedFish Virtual Media, each node must meet minimum firmware requirements for the installation program to deploy the node. See *Firmware requirements for installing with virtual media* in the *Prerequisites* section for additional details.
25
25
26
26
image:flow4.png[Flow-Diagram-4]
27
27
28
28
_Workflow 4 of 4_ illustrates a troubleshooting workflow from
29
-
xref:ipi-install-troubleshooting-api-not-accessible_{context}[ a non-accessible API] to a xref:ipi-install-troubleshooting-reviewing-the-installation_{context}[validated installation].
29
+
xref:investigating-an-unavailable-kubernetes-api_ipi-install-troubleshooting[ a non-accessible API] to a xref:ipi-install-troubleshooting-reviewing-the-installation_ipi-install-troubleshooting[validated installation].
The {product-title} installation program spawns a bootstrap node virtual machine, which handles provisioning the {product-title} cluster nodes.
10
10
@@ -28,10 +28,8 @@ $ sudo virsh list
28
28
====
29
29
The name of the bootstrap VM is always the cluster name followed by a random set of characters and ending in the word "bootstrap."
30
30
====
31
-
+
32
-
If the bootstrap VM is not running after 10-15 minutes, troubleshoot why it is not running. Possible issues include:
33
31
34
-
. Verify `libvirtd` is running on the system:
32
+
. If the bootstrap VM is not running after 10-15 minutes, verify `libvirtd` is running on the system by executing the following command:
35
33
+
36
34
[source,terminal]
37
35
----
@@ -79,7 +77,6 @@ localhost login:
79
77
When deploying an {product-title} cluster without the `provisioning` network, you must use a public IP address and not a private IP address like `172.22.0.2`.
80
78
====
81
79
82
-
83
80
. After you obtain the IP address, log in to the bootstrap VM using the `ssh` command:
If you are not successful logging in to the bootstrap VM, you have likely encountered one of the following scenarios:
96
93
97
94
* You cannot reach the `172.22.0.0/24` network. Verify the network connectivity between the provisioner and the `provisioning` network bridge. This issue might occur if you are using a `provisioning` network.
98
-
`
99
-
* You cannot reach the bootstrap VM through the public network. When attempting
100
-
to SSH via `baremetal` network, verify connectivity on the
95
+
96
+
* You cannot reach the bootstrap VM through the public network. When attempting to SSH via `baremetal` network, verify connectivity on the
101
97
`provisioner` host specifically around the `baremetal` network bridge.
102
98
103
-
* You encountered `Permission denied (publickey,password,keyboard-interactive)`. When
104
-
attempting to access the bootstrap VM, a `Permission denied` error
105
-
might occur. Verify that the SSH key for the user attempting to log
106
-
in to the VM is set within the `install-config.yaml` file.
99
+
* You encountered `Permission denied (publickey,password,keyboard-interactive)`. When attempting to access the bootstrap VM, a `Permission denied` error might occur. Verify that the SSH key for the user attempting to log in to the VM is set within the `install-config.yaml` file.
The installation program does not provision worker nodes directly. Instead, the Machine API Operator scales nodes up and down on supported platforms. If worker nodes are not created after 15 to 20 minutes, depending on the speed of the cluster's internet connection, investigate the Machine API Operator.
10
+
11
+
.Procedure
12
+
13
+
. Check the Machine API Operator by running the following command:
When the cluster is running and clients cannot access the API, domain name resolution issues might impede access to the API.
9
+
When the Kubernetes API is unavailable, check the control plane nodes to ensure that they are running the correct components. Also, check the hostname resolution.
10
10
11
11
.Procedure
12
12
13
-
. **Hostname Resolution:** Check the cluster nodes to ensure they have a fully qualified domain name, and not just `localhost.localdomain`. For example:
13
+
. Ensure that `etcd` is running on each of the control plane nodes by running the following command:
The Cluster Network Operator is responsible for deploying the networking components in response to a special object created by the installer. It runs very early in the installation process, after the control plane (master) nodes have come up, but before the bootstrap control plane has been torn down. It can be indicative of more subtle installer issues, such as long delays in bringing up control plane (master) nodes or issues with `apiserver` communication.
17
+
The Cluster Network Operator is responsible for deploying the networking components in response to a special object created by the installation program. It runs very early in the installation process, after the control plane (master) nodes have come up, but before the bootstrap control plane has been torn down. It can be indicative of more subtle installation program issues, such as long delays in bringing up control plane (master) nodes or issues with `apiserver` communication.
18
18
19
19
.Procedure
20
20
@@ -54,7 +54,7 @@ spec:
54
54
networkType: OVNKubernetes
55
55
----
56
56
+
57
-
If it does not exist, the installer did not create it. To determine why the installer did not create it, execute the following:
57
+
If it does not exist, the installation program did not create it. To determine why the installation program did not create it, execute the following:
58
58
+
59
59
[source,terminal]
60
60
----
@@ -75,7 +75,7 @@ $ kubectl -n openshift-network-operator get pods
On high availability clusters with three or more control plane (master) nodes, the Operator will perform leader election and all other Operators will sleep. For additional details, see https://github.com/openshift/installer/blob/master/docs/user/troubleshooting.md[Troubleshooting].
78
+
On high availability clusters with three or more control plane nodes, the Operator will perform leader election and all other Operators will sleep. For additional details, see https://github.com/openshift/installer/blob/master/docs/user/troubleshooting.md[Troubleshooting].
79
79
80
80
== Addressing the "No disk found with matching rootDeviceHints" error message
= Troubleshooting a failure to add the ingress certificate to kubeconfig
8
+
9
+
The installation program adds the default ingress certificate to the list of trusted client certificate authorities in `${INSTALL_DIR}/auth/kubeconfig`. If the installation program fails to add the ingress certificate to the `kubeconfig` file, you can retrieve the certificate from the cluster and add it.
10
+
11
+
.Procedure
12
+
13
+
. Retrieve the certificate from the cluster using the following command:
14
+
+
15
+
[source,terminal]
16
+
----
17
+
$ oc --kubeconfig=${INSTALL_DIR}/auth/kubeconfig get configmaps default-ingress-cert \
= Troubleshooting a failure to fetch the console URL
8
+
9
+
The installation program retrieves the URL for the {product-title} console by using `[route][route-object]` within the `openshift-console` namespace. If the installation program fails the retrieve the URL for the console, use the following procedure.
10
+
11
+
.Procedure
12
+
13
+
. Check if the console router is in the `Available` or `Failing` state by running the following command:
14
+
+
15
+
[source,terminal]
16
+
----
17
+
$ oc --kubeconfig=${INSTALL_DIR}/auth/kubeconfig get clusteroperator console -oyaml
0 commit comments