OBSDOCS-174 - Loki Zone Failure Recovery - w peer rev

libander · libander · commit f308402a82b5 · 2023-11-08T13:26:47.000-06:00
diff --git a/logging/cluster-logging-loki.adoc b/logging/cluster-logging-loki.adoc
@@ -12,7 +12,7 @@ Loki is a horizontally scalable, highly available, multi-tenant log aggregation
 
 include::modules/loki-deployment-sizing.adoc[leveloffset=+1]
 
-include::modules/cluster-logging-loki-deploy.adoc[leveloffset=+1]
+//include::modules/cluster-logging-loki-deploy.adoc[leveloffset=+1]
 
 include::modules/logging-creating-new-group-cluster-admin-user-role.adoc[leveloffset=+1]
 
@@ -33,8 +33,21 @@ include::modules/logging-loki-reliability-hardening.adoc[leveloffset=+1]
 * link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#podantiaffinity-v1-core[`PodAntiAffinity` v1 core Kubernetes documentation]
 * link:https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity[Assigning Pods to Nodes Kubernetes documentation]
 
-ifdef::openshift-enterprise[]
 * xref:../nodes/scheduling/nodes-scheduler-pod-affinity.adoc#nodes-scheduler-pod-affinity[Placing pods relative to other pods using affinity and anti-affinity rules]
+
+
+include::modules/logging-loki-zone-aware-rep.adoc[leveloffset=+1]
+
+include::modules/logging-loki-zone-fail-recovery.adoc[leveloffset=+2]
+
+[role="_additional-resources"]
+.Additional resources
+* link:https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/#spread-constraint-definition[Topology spread constraints Kubernetes documentation]
+
+* link:https://kubernetes.io/docs/setup/best-practices/multiple-zones/#storage-access-for-zones[Kubernetes storage documentation].
+
+ifdef::openshift-enterprise[]
+* xref:../nodes/scheduling/nodes-scheduler-pod-topology-spread-constraints.adoc#nodes-scheduler-pod-topology-spread-constraints-configuring[Controlling pod placement by using pod topology spread constraints]
 endif::[]
 
 include::modules/logging-loki-retention.adoc[leveloffset=+1]
diff --git a/modules/logging-loki-zone-aware-rep.adoc b/modules/logging-loki-zone-aware-rep.adoc
@@ -0,0 +1,32 @@
+// Module included in the following assemblies:
+//
+// * logging/cluster-logging-loki.adoc
+
+:_mod-docs-content-type: CONCEPT
+[id="logging-loki-zone-aware-rep_{context}"]
+= Zone aware data replication
+
+In the {logging} 5.8 and later versions, the Loki Operator offers support for zone-aware data replication through pod topology spread constraints. Enabling this feature enhances reliability and safeguards against log loss in the event of a single zone failure. When configuring the deployment size as `1x.extra.small`, `1x.small`, or `1x.medium,` the `replication.factor` field is automatically set to 2.
+
+To ensure proper replication, you need to have at least as many availability zones as the replication factor specifies. While it is possible to have more availability zones than the replication factor, having fewer zones can lead to write failures. Each zone should host an equal number of instances for optimal operation.
+
+.Example LokiStack CR with zone replication enabled
+[source,yaml]
+----
+apiVersion: loki.grafana.com/v1
+kind: LokiStack
+metadata:
+ name: logging-loki
+ namespace: openshift-logging
+spec:
+ replicationFactor: 2 # <1>
+ replication:
+   factor: 2 # <2>
+   zones:
+   -  maxSkew: 1 # <3>
+      topologyKey: topology.kubernetes.io/zone # <4>
+----
+<1> Deprecated field, values entered are overwritten by `replication.factor`.
+<2> This value is automatically set when deployment size is selected at setup.
+<3> The maximum difference in number of pods between any two topology domains. The default is 1, and you cannot specify a value of 0.
+<4> Defines zones in the form of a topology key that corresponds to a node label.
diff --git a/modules/logging-loki-zone-fail-recovery.adoc b/modules/logging-loki-zone-fail-recovery.adoc
@@ -0,0 +1,86 @@
+// Module included in the following assemblies:
+//
+// * logging/cluster-logging-loki.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="logging-loki-zone-fail-recovery_{context}"]
+= Recovering Loki pods from failed zones
+
+In {product-title} a zone failure happens when specific availability zone resources become inaccessible. Availability zones are isolated areas within a cloud provider's data center, aimed at enhancing redundancy and fault tolerance. If your {product-title} cluster isn't configured to handle this, a zone failure can lead to service or data loss.
+
+Loki pods are part of a link:https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/[StatefulSet], and they come with Persistent Volume Claims (PVCs) provisioned by a `StorageClass` object. Each Loki pod and its PVCs reside in the same zone. When a zone failure occurs in a cluster, the StatefulSet controller automatically attempts to recover the affected pods in the failed zone.
+
+[WARNING]
+====
+The following procedure will delete the PVCs in the failed zone, and all data contained therein.  To avoid complete data loss the replication factor field of the `LokiStack` CR should always be set to a value greater than 1 to ensure that Loki is replicating.
+====
+
+.Prerequisites
+* Logging version 5.8 or later.
+* Verify your `LokiStack` CR has a replication factor greater than 1.
+* Zone failure detected by the control plane, and nodes in the failed zone are marked by cloud provider integration.
+
+The StatefulSet controller automatically attempts to reschedule pods in a failed zone. Because the associated PVCs are also in the failed zone, automatic rescheduling to a different zone does not work. You must manually delete the PVCs in the failed zone to allow successful re-creation of the stateful Loki Pod and its provisioned PVC in the new zone.
+
+
+.Procedure
+. List the pods in `Pending` status by running the following command:
++
+[source,terminal]
+----
+oc get pods --field-selector status.phase==Pending -n openshift-logging
+----
++
+.Example `oc get pods` output
+[source,terminal]
+----
+NAME                           READY   STATUS    RESTARTS   AGE # <1>
+logging-loki-index-gateway-1   0/1     Pending   0          17m
+logging-loki-ingester-1        0/1     Pending   0          16m
+logging-loki-ruler-1           0/1     Pending   0          16m
+----
+<1> These pods are in `Pending` status because their corresponding PVCs are in the failed zone.
+
+.. List the PVCs in `Pending` status by running the following command:
++
+[source,terminal]
+----
+oc get pvc -o=json -n openshift-logging | jq '.items[] | select(.status.phase == "Pending") | .metadata.name' -r
+----
++
+.Example `oc get pvc` output
+[source,terminal]
+----
+storage-logging-loki-index-gateway-1
+storage-logging-loki-ingester-1
+wal-logging-loki-ingester-1
+storage-logging-loki-ruler-1
+wal-logging-loki-ruler-1
+----
+
+.. Delete the PVC(s) for a pod by running the following command:
++
+[source,terminal]
+----
+oc delete pvc __<pvc_name>__  -n openshift-logging
+----
++
+.. Then delete the pod(s) by running the following command:
+[source,terminal]
+----
+oc delete pod __<pod_name>__  -n openshift-logging
+----
+
+Once these objects have been successfully deleted, they should automatically be rescheduled in an available zone.
+
+[id="logging-loki-zone-fail-term-state_{context}"]
+== Troubleshooting PVC in a terminating state
+
+The PVCs might hang in the terminating state without being deleted, if PVC metadata finalizers are set to `kubernetes.io/pv-protection`. Removing the finalizers should allow the PVCs to delete successfully.
+
+. Remove the finalizer for each PVC by running the command below, then retry deletion.
+
+[source,terminal]
+----
+oc patch pvc __<pvc_name>__ -p '{"metadata":{"finalizers":null}}' -n openshift-logging
+----