Skip to content

Commit ca1e981

Browse files
author
Natalia Jordan
committed
addinng a troubleshooting document and updating readme accordingly
1 parent 4cbd02c commit ca1e981

File tree

2 files changed

+191
-0
lines changed

2 files changed

+191
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,8 @@ You'll need to place the file in the install directory and name it as **pull-sec
184184
185185
```
186186
187+
**Note**: If you encounter terraform-related errors during the create command, see ["Known Issues & Troubleshooting"](https://github.com/ocp-power-automation/ocp4-upi-powervs/blob/release-4.6/docs/known_issues.md) and ["TroubleShooting Document"](docs/troubleShooting.md)
188+
187189
## Advanced Usage
188190
189191
Before running the script, you may choose to override some environment variables as per your requirement.

docs/troubleShooting.md

Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
2+
# OpenShift on IBM PowerVS: Common Issues and Resolutions
3+
4+
This document lists common issues encountered when deploying OpenShift on IBM PowerVS using the `openshift-install-powervs` wrapper, along with their causes and resolutions.
5+
6+
---
7+
8+
## Terraform Stored Resource IDs
9+
10+
**Error:**
11+
12+
Error: cannot find resource with id <resource-id>
13+
14+
**Cause:**
15+
Terraform retains deleted PowerVS resource IDs in its state or backup files. This often occurs after a Terraform rerun when instances or resources have changed in PowerVS.
16+
17+
18+
**Resolution:**
19+
20+
Search for the stale ID in Terraform state or backup files:
21+
22+
```bash
23+
grep -R "<resource-id>" .
24+
```
25+
26+
Remove stale state entries:
27+
28+
```bash
29+
30+
terraform state rm <resource-name>
31+
```
32+
33+
Re-run the apply:
34+
35+
```bash
36+
terraform apply
37+
```
38+
39+
To rebuild specific worker or master nodes:
40+
41+
```bash
42+
43+
terraform taint module.nodes.ibm_pi_instance.worker[0]
44+
terraform apply
45+
```
46+
47+
## Bastion Node OS Compatibility
48+
49+
If getting errors regarding missing packages or incorrect storage type while using CentOS 10, switch to CentOS Stream 9 to avoid missing package errors or volume type mismatches.
50+
51+
Common Issues and Fixes
52+
53+
Missing Required Packages (e.g. Ansible)
54+
55+
**Error**:
56+
Missing ansible or dependency packages during setup.
57+
58+
**Resolution**:
59+
SSH into the bastion node using the generated key:
60+
ssh -i id_rsa root@<bastion-external-ip>
61+
sudo dnf install ansible
62+
63+
- note: you can also import using python and pip, if the above does not work.
64+
65+
**Error**
66+
Incorrect Storage Type (e.g. "nfs" not recognized)
67+
68+
Error: "pi_volume_type" must contain a value from ["ssd", "standard", "tier1", "tier3"], got ""
69+
70+
71+
**Resolution**:
72+
Edit your variables.tf or corresponding .tfvars file:
73+
bastion_storage_type = "tier3"
74+
75+
- if needed change the defautlt bastion_storage_type in variables.tf to the storage type you desire
76+
- note you can easly find this by hitting CTRL + W and searching for `bastion_storage_type`
77+
78+
79+
## Re-installation / Network Name Conflict
80+
81+
**Error:**
82+
83+
Error: Network with name "ocp-net" already exists.
84+
85+
86+
**Cause:**
87+
On a subsequent UPI install attempt, Terraform tries to create a network with the same name that already exists.
88+
PowerVS does not allow duplicate network names—even if the old network is inactive.
89+
90+
**Resolution:**
91+
92+
- Log into your PowerVS workspace.
93+
94+
- Delete or rename the existing ocp-net network or subnet.
95+
96+
- Re-run the installer:
97+
```bash
98+
terraform apply ./openshift-install-powervs create
99+
```
100+
101+
⚠️ Renaming networks automatically is not recommended—it can lead to subnet sprawl and degraded performance.
102+
103+
## Remote-Exec Provisioning Errors
104+
105+
**Error:**
106+
107+
Terraform remote-exec provisioner failures
108+
109+
110+
Cause:
111+
These are transient SSH or remote-execution issues that occur during provisioning.
112+
113+
Resolution:
114+
Re-run Terraform:
115+
116+
terraform apply
117+
118+
119+
This typically resolves the issue automatically.
120+
See ocp4-upi-powervs known issues for more details. ["OCP Known issues"]((https://github.com/ocp-power-automation/ocp4-upi-powervs/blob/release-4.6/docs/known_issues.md))
121+
122+
5. LPAR in WARNING State
123+
124+
Error:
125+
126+
Error: the operation cannot be performed when the lpar health in the WARNING State
127+
128+
129+
Cause:
130+
Terraform cannot modify instances whose PowerVS LPAR health is in WARNING state.
131+
This often occurs after partial provisioning, failed networking setup, or API timeouts.
132+
133+
Resolution:
134+
135+
Check instance health:
136+
```bash
137+
138+
ibmcloud pi instance get <INSTANCE_ID>
139+
```
140+
141+
142+
**Note**: Due to RSCT daemon not being available for RHCOS, RHCOS instances in dashboard can show "Warning" Status, ignore this!
143+
144+
In console reboot instances by OS shutting down the instance, then restarting
145+
146+
To rebuild only specific nodes:
147+
```bash
148+
149+
terraform taint module.nodes.ibm_pi_instance.master[1]
150+
terraform taint module.nodes.ibm_pi_instance.worker[0]
151+
terraform apply
152+
```
153+
154+
## Missing or Outdated Images (RHEL / RHCOS)
155+
156+
**Error:**
157+
158+
Error: failed to perform Get Image Operation for image rhcos-4.15
159+
[pcloudCloudinstancesImagesGetNotFound] image does not exist. ID: rhcos-4.12
160+
161+
**Cause:**
162+
Terraform and the PowerVS provider reference image names (e.g. rhcos-4.15, rhel-8.3) that may not exist in your workspace.
163+
The wrapper may also use the RHEL version for RHCOS images by mistake.
164+
165+
**Resolution:**
166+
167+
Option 1 — Import Pre-built Images
168+
169+
Use pre-built RHCOS and RHEL OVA images from IBM’s public repository.
170+
See Christy Norman’s blog
171+
for steps. ["Blog"](https://community.ibm.com/community/user/blogs/christy-norman/2024/08/06/import-pre-built-red-hat-coreos-ovas-into-powervs)
172+
173+
Option 2 — Update variables.tf
174+
175+
Set available image names manually:
176+
```bash
177+
178+
variable "rhel_image_name" {
179+
default = "rhel-8.9"
180+
}
181+
182+
variable "rhcos_image_name" {
183+
default = "rhcos-4.15"
184+
}
185+
```
186+
Option 3 — Export Versions Before Running
187+
export RELEASE_VER=4.9
188+
189+
Ensure RHEL and RHCOS versions are aligned and available.

0 commit comments

Comments
 (0)