|
| 1 | +--- |
| 2 | +title: "CNPG Recipe 21 – Finer Control of Postgres Clusters with Liveness Probes" |
| 3 | +date: 2025-08-15T00:03:25+10:00 |
| 4 | +description: "how CloudNativePG leverages Kubernetes liveness probes to give users more reliable and configurable control over PostgreSQL in high-availability clusters" |
| 5 | +tags: ["postgresql", "postgres", "kubernetes", "k8s", "cloudnativepg", "cnpg", "postgresql", "postgres", "dok", "data on kubernetes", "probes", "cncf", "startup", "pg_isready", "liveness", "isolation", "primary isolation", "split-brain"] |
| 6 | +cover: cover.jpg |
| 7 | +thumb: thumb.jpg |
| 8 | +draft: false |
| 9 | +--- |
| 10 | + |
| 11 | +_In this article, I explore how CloudNativePG 1.27 enhances PostgreSQL |
| 12 | +liveness probes, including primary isolation checks that mitigate split-brain |
| 13 | +scenarios and integrate seamlessly with Kubernetes. We also discuss how these |
| 14 | +improvements lay the groundwork for advanced features like quorum-based |
| 15 | +failover while maintaining stability, safety, and community-driven |
| 16 | +decision-making._ |
| 17 | + |
| 18 | +<!--more--> |
| 19 | + |
| 20 | +--- |
| 21 | + |
| 22 | +In the previous articles — |
| 23 | +[CNPG Recipe 19 - Finer Control Over Postgres Startup with Probes]({{< relref "../20250617-startup-probes/index.md" >}}) |
| 24 | +and [CNPG Recipe 20 - Finer Control of Postgres Clusters with Readiness Probes]({{< relref "../20250625-readiness-probes/index.md" >}}) |
| 25 | +— I covered the enhancements to the |
| 26 | +[probing infrastructure introduced in CloudNativePG 1.26](https://github.com/cloudnative-pg/cloudnative-pg/pull/6623), |
| 27 | +focusing on startup and readiness probes respectively. |
| 28 | + |
| 29 | +In this article, I'll explore the third — and last — probe provided by CloudNativePG: the **liveness** probe. |
| 30 | + |
| 31 | +--- |
| 32 | + |
| 33 | +## Understanding Liveness Probes |
| 34 | + |
| 35 | +[Liveness probes](https://kubernetes.io/docs/concepts/configuration/liveness-readiness-startup-probes/#liveness-probe) |
| 36 | +have been part of Kubernetes since the very beginning. Their purpose is to |
| 37 | +ensure that a workload — in our case, PostgreSQL — is still running and able to |
| 38 | +perform its intended function. |
| 39 | + |
| 40 | +Just like the readiness probe, the liveness probe only starts running *after* |
| 41 | +the startup probe has succeeded. It then continues to run periodically for the |
| 42 | +entire lifetime of the container. |
| 43 | + |
| 44 | +As mentioned in CNPG Recipe 19, liveness probes share the same configuration |
| 45 | +parameters as startup and readiness probes: |
| 46 | + |
| 47 | +* `failureThreshold` |
| 48 | +* `periodSeconds` |
| 49 | +* `successThreshold` |
| 50 | +* `timeoutSeconds` |
| 51 | + |
| 52 | +--- |
| 53 | + |
| 54 | +## How CloudNativePG Implements Liveness Probes |
| 55 | + |
| 56 | +At a high level, the goal of the liveness probe is to confirm that the |
| 57 | +PostgreSQL workload is healthy. As long as the probe succeeds, the Kubernetes |
| 58 | +**kubelet** will keep the pod running. If the probe fails, the kubelet will |
| 59 | +restart the pod. |
| 60 | + |
| 61 | +From CloudNativePG’s perspective, the liveness probe checks whether the |
| 62 | +**instance manager** is functioning. If you’re not familiar with it, the |
| 63 | +instance manager is the entrypoint (`PID 1`) of the `postgres` container — the |
| 64 | +main workload. I often describe it as a distributed extension of the operator’s |
| 65 | +brain, or more playfully, its *right arm*. |
| 66 | + |
| 67 | +The instance manager provides, among other things, a REST API used by the |
| 68 | +operator to coordinate operations. This includes controlling the PostgreSQL |
| 69 | +server process itself and serving the endpoints for the startup, readiness, and |
| 70 | +liveness probes. |
| 71 | + |
| 72 | +By default, the liveness probe reports **success** as long as the instance |
| 73 | +manager is up and running. It reports **failure** if it cannot be reached for |
| 74 | +more than `.spec.livenessProbeTimeout` seconds (default: 30 seconds). |
| 75 | + |
| 76 | +Starting with CloudNativePG 1.27, this basic check is enhanced on the |
| 77 | +primary instance with an additional safeguard: it now verifies whether the |
| 78 | +instance is isolated from both the API server and the other replicas. I’ll |
| 79 | +cover the details of this improvement later in the article. |
| 80 | + |
| 81 | +The `.spec.livenessProbeTimeout` setting acts as a higher-level abstraction |
| 82 | +over the raw Kubernetes probe configuration. Internally, it maps to the |
| 83 | +following parameters: |
| 84 | + |
| 85 | +```yaml |
| 86 | +failureThreshold: FAILURE_THRESHOLD |
| 87 | +periodSeconds: 10 |
| 88 | +successThreshold: 1 |
| 89 | +timeoutSeconds: 5 |
| 90 | +``` |
| 91 | +
|
| 92 | +Here, `FAILURE_THRESHOLD` is automatically calculated as: |
| 93 | + |
| 94 | +``` |
| 95 | +FAILURE_THRESHOLD = livenessProbeTimeout / periodSeconds |
| 96 | +``` |
| 97 | +
|
| 98 | +This means that with the default values (`livenessProbeTimeout: 30`, |
| 99 | +`periodSeconds: 10`), `FAILURE_THRESHOLD` will be `3`. |
| 100 | +
|
| 101 | +## Full Probe Customisation |
| 102 | +
|
| 103 | +Just like with readiness probes, if your scenario requires finer control, you |
| 104 | +can customise the liveness probe through the `.spec.probes.liveness` stanza by |
| 105 | +defining the standard Kubernetes probe parameters you should already be |
| 106 | +familiar with. |
| 107 | +
|
| 108 | +The following example configures Kubernetes to: |
| 109 | +
|
| 110 | +- Probe the container every 5 seconds (`periodSeconds`) |
| 111 | +- Allow up to 6 consecutive failures (`failureThreshold`) — still |
| 112 | + equivalent to a 30-second tolerance window, but with a higher probing |
| 113 | + frequency — before marking the container as *not alive* |
| 114 | +
|
| 115 | +```yaml |
| 116 | +{{< include "yaml/freddie-custom.yaml" >}} |
| 117 | +``` |
| 118 | + |
| 119 | +## Liveness Probe and Primary Isolation |
| 120 | + |
| 121 | +A few months ago, an |
| 122 | +[issue](https://github.com/cloudnative-pg/cloudnative-pg/issues/7407) was |
| 123 | +raised in CloudNativePG regarding the risk of a split-brain scenario during a |
| 124 | +network partition. This sparked a productive |
| 125 | +[discussion](https://github.com/cloudnative-pg/cloudnative-pg/issues/7407) |
| 126 | +within the community, which I recommend reading. |
| 127 | +It ultimately led to a [new default behaviour](https://cloudnative-pg.io/documentation/current/instance_manager/#primary-isolation) |
| 128 | +introduced in CloudNativePG 1.27, after debuting as an |
| 129 | +[experimental feature in 1.26](https://cloudnative-pg.io/documentation/1.26/instance_manager/#primary-isolation-alpha). |
| 130 | + |
| 131 | +The enhancement applies specifically to the liveness probe on **primary** pods. |
| 132 | +In addition to checking that the instance manager is running, the probe now |
| 133 | +also verifies that the primary can: |
| 134 | + |
| 135 | +- Reach the Kubernetes API server |
| 136 | +- Reach the instance manager of every replica, via their REST API endpoint |
| 137 | + |
| 138 | +If either check fails for longer than the configured `livenessProbeTimeout`, |
| 139 | +the kubelet restarts the pod. On restart, the instance manager first attempts |
| 140 | +to download the Cluster definition. If this fails — for example, because the |
| 141 | +pod is still isolated — PostgreSQL will not start. This ensures that an |
| 142 | +isolated primary cannot continue accepting writes, reducing the risk of data |
| 143 | +divergence. |
| 144 | + |
| 145 | +While this does not completely prevent split-brain — the isolated primary can |
| 146 | +still accept writes from workloads in the same partition until the pod is |
| 147 | +terminated — it helps mitigate the risk by shortening the time window during |
| 148 | +which two primaries might be active in the cluster (by default, 30 seconds). |
| 149 | + |
| 150 | +This behaviour is conceptually similar to the |
| 151 | +[failsafe mode in Patroni](https://patroni.readthedocs.io/en/latest/dcs_failsafe_mode.html). |
| 152 | +The key difference is that CloudNativePG provides its own built-in mechanism, |
| 153 | +fully integrated with the Kubernetes liveness probe. |
| 154 | + |
| 155 | +As mentioned earlier, the primary isolation check is enabled by default on |
| 156 | +every PostgreSQL cluster you deploy. While there is generally no reason to |
| 157 | +disable it, you can turn it off if needed, as shown in the example below: |
| 158 | + |
| 159 | + |
| 160 | +```yaml |
| 161 | +{{< include "yaml/freddie-disable-check.yaml" >}} |
| 162 | +``` |
| 163 | + |
| 164 | +## Key Takeaways |
| 165 | + |
| 166 | +CloudNativePG’s probing infrastructure has matured into a robust, |
| 167 | +Kubernetes-native system that now accounts for both replicas and primaries. The |
| 168 | +primary isolation check in the liveness probe enhances cluster reliability by |
| 169 | +reducing the risk of unsafe operations in network-partitioned scenarios, making |
| 170 | +PostgreSQL behaviour more predictable and safer for administrators. |
| 171 | + |
| 172 | +Key practical takeaways: |
| 173 | + |
| 174 | +- **Primary isolation check enabled by default:** Liveness probes now verify |
| 175 | + that a primary can reach the API server and other replicas. |
| 176 | +- **Mitigates split-brain scenarios:** Reduces the time window during which |
| 177 | + multiple primaries could accept writes. When synchronous replication is used |
| 178 | + (as recommended), the likelihood of a split-brain on an isolated primary is |
| 179 | + close to zero. |
| 180 | +- **Fully integrated with Kubernetes probes:** Achieves robust behaviour |
| 181 | + without introducing external dependencies. |
| 182 | +- **Foundation for quorum-based failover:** Enables the experimental |
| 183 | + [quorum-based failover](https://cloudnative-pg.io/documentation/current/failover/#failover-quorum-quorum-based-failover) |
| 184 | + feature in 1.27, which will be stable in 1.28, offering safer synchronous |
| 185 | + replication failover. |
| 186 | + |
| 187 | +This evolution reflects a careful, staged approach: first reorganising startup |
| 188 | +and readiness probes, then adding primary isolation checks, and finally paving |
| 189 | +the way for advanced failover mechanisms—demonstrating CloudNativePG’s |
| 190 | +commitment to stability, safety, and innovation, and ensuring that the project |
| 191 | +and its community were mature enough to make these decisions together. |
| 192 | + |
| 193 | +--- |
| 194 | + |
| 195 | +Stay tuned for the upcoming recipes! For the latest updates, consider |
| 196 | +subscribing to my [LinkedIn](https://www.linkedin.com/in/gbartolini/) and |
| 197 | +[Twitter](https://twitter.com/_GBartolini_) channels. |
| 198 | + |
| 199 | +If you found this article informative, feel free to share it within your |
| 200 | +network on social media using the provided links below. Your support is |
| 201 | +immensely appreciated! |
| 202 | + |
| 203 | +_Cover Picture: [“Elephants are smart, big, and sensitive“](https://commons.wikimedia.org/wiki/File:Elephants_are_smart,_big,_and_sensitive.jpg)._ |
| 204 | + |
0 commit comments