diff --git a/README.md b/README.md index 1ce1eab..941b0d2 100644 --- a/README.md +++ b/README.md @@ -184,6 +184,8 @@ You'll need to place the file in the install directory and name it as **pull-sec ``` +**Note**: If you encounter terraform-related errors during the create command, see ["Known Issues & Troubleshooting"](https://github.com/ocp-power-automation/ocp4-upi-powervs/blob/release-4.6/docs/known_issues.md) and ["TroubleShooting Document"](docs/troubleshooting.md) + ## Advanced Usage Before running the script, you may choose to override some environment variables as per your requirement. diff --git a/docs/troubleShooting.md b/docs/troubleShooting.md new file mode 100644 index 0000000..1b3db0f --- /dev/null +++ b/docs/troubleShooting.md @@ -0,0 +1,161 @@ + +# Common Issues and Resolutions + +The following lists common issues encountered when deploying OpenShift on IBM PowerVS using the `openshift-install-powervs` wrapper, along with their causes and resolutions. + + + +## 1. Re-installation / Network Name Conflict + +**Error** + +"Network with name "ocp-net" already exists." + + +**Cause** + +On a subsequent UPI install attempt, Terraform tries to create a network with the same name that already exists. +PowerVS does not allow duplicate network names—even if the old network is inactive. + +**Resolution** + +- Log into your PowerVS workspace. +- Delete or rename the existing ocp-net network or subnet. +- Re-run the installer: +```bash + terraform apply ./openshift-install-powervs create +``` + +## 2. Remote-Exec Provisioning Errors + +**Error** + +"Terraform remote-exec provisioner failures" + + +**Cause** + +These are transient SSH or remote-execution issues that occur during provisioning. + +**Resolution** + +Re-run Terraform using the following command: + `terraform apply` + + +This typically resolves the issue automatically. +See ocp4-upi-powervs known issues for more details: ["OCP Known issues"](https://github.com/ocp-power-automation/ocp4-upi-powervs/blob/release-4.6/docs/known_issues.md) + +## 3. LPAR in WARNING State + +**Error** + +"The operation cannot be performed when the lpar health in the WARNING State." + + +**Cause** + +Terraform cannot modify instances whose PowerVS LPAR health is in WARNING state. +This often occurs after partial provisioning, failed networking setup, or API timeouts. + +**Resolution** + +Check instance health: +```bash +ibmcloud pi instance get +``` +**Note**: Due to RSCT daemon not being available for RHCOS, RHCOS instances in dashboard can show "Warning" Status, you can safely ignore this. + +In the console, reboot instances by OS shutting them down and restarting them + +To rebuild only specific nodes: +```bash +terraform taint module.nodes.ibm_pi_instance.master[1] +terraform taint module.nodes.ibm_pi_instance.worker[0] +terraform apply +``` + +## 4. Missing or Outdated Images (RHEL / RHCOS) + +**Error** + +"failed to perform Get Image Operation for image rhcos-4.20 +[pcloudCloudinstancesImagesGetNotFound] Image does not exist. ID: rhcos-4.20" + +**Cause** + +Terraform and the PowerVS provider reference image names (e.g. rhcos-4.20, rhel-9.63) that may not exist in your workspace. +The wrapper may also use the RHEL version for RHCOS images by mistake. + +**Resolution** + +*Option 1* — Import Pre-built Images + +Use pre-built RHCOS and RHEL OVA images from IBM’s public repository. +See Christy Norman’s blog + for steps. ["Blog"](https://community.ibm.com/community/user/blogs/christy-norman/2024/08/06/import-pre-built-red-hat-coreos-ovas-into-powervs) + +*Option 2* — Update variables.tf + +Set available image names manually: +```bash +variable "rhel_image_name" { + default = "rhel-9.6" +} + +variable "rhcos_image_name" { + default = "rhcos-4.20" +} +``` +*Option 3* — Export Versions Before Running +export RELEASE_VER=4.20 + +Ensure that the RHEL and RHCOS versions are aligned and available in your workspace. + +## 5. Terraform Stored Resource IDs + +### **Developers Only** + +> ⚠️ WARNING: The following command is intended **for developers or advanced users only**. +> +> Using this command without a full understanding of its purpose and impact can lead to an **inconsistent Terraform state**, **resource corruption**, or **loss of data**. +> +> Proceed **only if you understand** how Terraform manages state and resource dependencies. +> Always create a state backup before making manual modifications. + + +**Error** + +"cannot find resource with id ``" + +**Cause** + +Terraform retains deleted PowerVS resource IDs in its state or backup files. This often occurs after a Terraform rerun when instances or resources have changed in PowerVS. + + +**Resolution** + +Search for the stale ID in Terraform state or backup files: + +```bash +grep -R "" . +``` + +Remove stale state entries: + +```bash +terraform state rm +``` + +Re-run the apply: + +```bash +terraform apply +``` + +To rebuild specific worker or master nodes: + +```bash +terraform taint module.nodes.ibm_pi_instance.worker[0] +terraform apply +``` diff --git a/openshift-install-powervs b/openshift-install-powervs index f33c89b..b20fcc3 100755 --- a/openshift-install-powervs +++ b/openshift-install-powervs @@ -60,7 +60,7 @@ EOF exit 0 } -RELEASE_VER=${RELEASE_VER:-"4.15"} +RELEASE_VER=${RELEASE_VER:-"4.20"} ARTIFACTS_REPO=${ARTIFACTS_REPO:-"https://github.com/ocp-power-automation/ocp4-upi-powervs"} ARTIFACTS_VERSION=${ARTIFACTS_VERSION:-"main"} #ARTIFACTS_VERSION=${ARTIFACTS_VERSION:-"release-$RELEASE_VER"} @@ -159,6 +159,19 @@ function output { $TF output "$output_var" } +#------------------------------------------------------------------------- +# Display environment variable information for user awareness +#------------------------------------------------------------------------- +function display_env_info { + log "Using RHCOS release version: ${RELEASE_VER}" + + # Only show how to change if using default + if [[ "${RELEASE_VER}" == "4.20" ]]; then + echo " To use a different version: export RELEASE_VER=''" + fi + echo "" +} + #------------------------------------------------------------------------- # Util for retrying any command, special case for curl downloads #------------------------------------------------------------------------- @@ -1694,6 +1707,9 @@ function main { [[ -z "$ACTION" ]] && help platform_checks + if [[ "$ACTION" != "help" ]]; then + display_env_info + fi setup_tools case "$ACTION" in