Merge pull request #32950 from apinnick/bz1964388-bpg-cluster-health-checks

apinnick · web-flow · commit f86c4510d58c · 2021-06-10T13:27:19.000+03:00
BZ1964388: BPG premigration checks moved to MTC
diff --git a/_topic_map.yml b/_topic_map.yml
@@ -1978,6 +1978,8 @@ Topics:
   File: installing-restricted-3-4
 - Name: Upgrading MTC
   File: upgrading-3-4
+- Name: Premigration checklists
+  File: premigration-checklists
 - Name: Migrating your applications
   File: migrating-applications-3-4
 - Name: Troubleshooting
diff --git a/migrating_from_ocp_3_to_4/about-mtc-3-4.adoc b/migrating_from_ocp_3_to_4/about-mtc-3-4.adoc
@@ -5,9 +5,9 @@ include::modules/common-attributes.adoc[]
 
 toc::[]
 
-The {mtc-short} web console and API, based on Kubernetes custom resources, enable you to migrate stateful application workloads at the granularity of a namespace.
+The {mtc-full} ({mtc-short}) web console and API, based on Kubernetes custom resources, enable you to migrate stateful application workloads at the granularity of a namespace.
 
-You can migrate application workloads from {product-title} 3.7, 3.9, 3.10, and 3.11 to {product-title} {product-version} with the {mtc-full} ({mtc-short}). {mtc-short} enables you to control the migration and to minimize application downtime.
+You can migrate from {product-title} 3.7, 3.9, 3.10, or 3.11 to {product-version}. {mtc-short} enables you to control the migration and to minimize application downtime.
 
 [IMPORTANT]
 ====
diff --git a/migrating_from_ocp_3_to_4/premigration-checklists.adoc b/migrating_from_ocp_3_to_4/premigration-checklists.adoc
@@ -0,0 +1,90 @@
+[id="premigration-checks"]
+= Premigration checklists
+include::modules/common-attributes.adoc[]
+:context: premigration-checks
+
+toc::[]
+
+Before you migrate your application workloads with the {mtc-full} ({mtc-short}), review the following checklists.
+
+[id="source-cluster-checklist_{context}"]
+== Source cluster checklist
+
+* [ ] The cluster meets the link:https://docs.openshift.com/container-platform/3.11/install/prerequisites.html#hardware[minimum hardware requirements].
+* [ ] The {product-title} version is 3.7, 3.9, 3.10, or 3.11.
+* [ ] The {mtc-short} version is the same on all clusters.
+* [ ] All nodes have an active {product-title} subscription.
+* [ ] All the link:https://docs.openshift.com/container-platform/3.11/day_two_guide/run_once_tasks.html#day-two-guide-default-storage-class[run-once tasks] have been performed.
+* [ ] All the link:https://docs.openshift.com/container-platform/3.11/day_two_guide/environment_health_checks.html[environment health checks] have been performed.
+* [ ] You have checked for persistent volumes (PVs) with abnormal configurations  stuck in a *Terminating* state by running the following command:
++
+[source,terminal]
+----
+$ oc get pv
+----
+
+* [ ] You have checked for pods whose status is other than *Running* or *Completed* by running the following command:
++
+[source,terminal]
+----
+$ oc get pods --all-namespaces | egrep -v 'Running | Completed'
+----
+
+* [ ] You have checked for pods with a high restart count by running the following command:
++
+[source,terminal]
+----
+$ oc get pods --all-namespaces --field-selector=status.phase=Running \
+  -o json | jq '.items[]|select(any( .status.containerStatuses[]; \
+  .restartCount > 3))|.metadata.name'
+----
++
+Even if the pods are in a *Running* state, a high restart count might indicate underlying problems.
+
+* [ ] The internal container image registry uses a link:https://docs.openshift.com/container-platform/3.11/scaling_performance/optimizing_storage.html#registry[supported storage type].
+* [ ] You can read and write images to the registry.
+* [ ] The link:https://access.redhat.com/articles/3093761[etcd cluster] is healthy.
+* [ ] The link:https://docs.openshift.com/container-platform/3.11/install_config/master_node_configuration.html#master-node-configuration-node-qps-burst[average API server response time] on the source cluster is less than 50 ms.
+* [ ] The cluster certificates are link:https://docs.openshift.com/container-platform/3.11/install_config/redeploying_certificates.html#install-config-cert-expiry[valid] for the duration of the migration process.
+* [ ] You have checked for pending certificate-signing requests by running the following command:
++
+[source,terminal]
+----
+$ oc get csr -A | grep pending -i
+----
+
+* [ ] The link:https://docs.openshift.com/container-platform/3.11/install_config/configuring_authentication.html#overview[identity provider] is working.
+
+[id="target-cluster-checklist_{context}"]
+== Target cluster checklist
+
+* [ ] The {mtc-short} version is the same on all clusters.
+* [ ] All xref:../migrating_from_ocp_3_to_4/migrating-applications-3-4.adoc#migration-prerequisites_migrating-applications-3-4[{mtc-short} prerequisites] are met.
+* [ ] The cluster meets the minimum hardware requirements for the specific platform and installation method, for example, on xref:../installing/installing_bare_metal/installing-bare-metal.adoc#minimum-resource-requirements_installing-bare-metal[bare metal].
+* [ ] The cluster has xref:../storage/dynamic-provisioning.adoc#defining-storage-classes_dynamic-provisioning[storage classes] defined for the storage types used by the source cluster, for example, block volume, file system, or object storage.
++
+[NOTE]
+====
+NFS does not require a defined storage class.
+====
+
+* [ ] The cluster has the correct network configuration and permissions to access external services, for example, databases, source code repositories, container image registries, and CI/CD tools.
+* [ ] External applications and services that use services provided by the cluster have the correct network configuration and permissions to access the cluster.
+* [ ] Internal container image dependencies are met.
++
+If an application uses an internal image in the `openshift` namespace that is not supported by {product-title} {product-version}, you can manually update the xref:../migrating_from_ocp_3_to_4/migrating-applications-3-4.adoc#migration-updating-deprecated-internal-images_migrating-applications-3-4[{product-title} 3 image stream tag] with `podman`.
+* [ ] The target cluster and the replication repository have sufficient storage space.
+* [ ] The xref:../authentication/understanding-identity-provider.adoc#supported-identity-providers[identity provider] is working.
+
+[id="performance-checklist_{context}"]
+== Performance checklist
+
+* [ ] The migration network has a minimum throughput of 10 Gbps.
+* [ ] The clusters have sufficient resources for migration.
++
+[NOTE]
+====
+Clusters require additional memory, CPUs, and storage in order to run a migration on top of normal workloads. Actual resource requirements depend on the number of Kubernetes resources being migrated in a single migration plan. You must test migrations in a non-production environment in order to estimate the resource requirements.
+====
+* [ ] The xref:../support/troubleshooting/verifying-node-health.adoc#reviewing-node-status-use-and-configuration_verifying-node-health[memory and CPU usage] of the nodes are healthy.
+* [ ] The link:https://access.redhat.com/solutions/4885641[etcd disk performance] of the clusters has been checked with `fio`.