Skip to content

Commit c3ad9ab

Browse files
authored
Merge pull request #271593 from AlicjaKucharczyk/20240409-readreplica-ga
20240409 readreplica ga
2 parents ef55732 + 88c25bc commit c3ad9ab

File tree

6 files changed

+280
-208
lines changed

6 files changed

+280
-208
lines changed

articles/postgresql/TOC.yml

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,9 @@
130130
href: flexible-server/concepts-server-parameters.md
131131
- name: Troubleshooting guides
132132
href: flexible-server/concepts-troubleshooting-guides.md
133+
- name: Logical replication and logical decoding
134+
href: flexible-server/concepts-logical.md
135+
displayName: logical decoding, logical replication
133136
- name: Reliability
134137
items:
135138
- name: Overview
@@ -166,14 +169,17 @@
166169
href: flexible-server/concepts-query-performance-insight.md
167170
- name: Intelligent tuning
168171
href: flexible-server/concepts-intelligent-tuning.md
169-
- name: Replication
172+
- name: Read replicas
170173
items:
171-
- name: Read replicas
174+
- name: Overview
172175
href: flexible-server/concepts-read-replicas.md
173176
displayName: replication, read replica
174-
- name: Logical replication and logical decoding
175-
href: flexible-server/concepts-logical.md
176-
displayName: logical decoding, logical replication
177+
- name: Geo-Replication
178+
href: flexible-server/concepts-read-replicas-geo.md
179+
- name: Promote read replicas
180+
href: flexible-server/concepts-read-replicas-promote.md
181+
- name: Virtual endpoints
182+
href: flexible-server/concepts-read-replicas-virtual-endpoints.md
177183
- name: App development
178184
items:
179185
- name: Connection libraries
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
---
2+
title: Geo-replication
3+
description: This article describes the Geo-replication in Azure Database for PostgreSQL - Flexible Server.
4+
author: AlicjaKucharczyk
5+
ms.author: alkuchar
6+
ms.reviewer: maghan
7+
ms.date: 03/06/2024
8+
ms.service: postgresql
9+
ms.subservice: flexible-server
10+
ms.custom:
11+
- ignite-2023
12+
ms.topic: conceptual
13+
---
14+
15+
# Geo-replication in Azure Database for PostgreSQL - Flexible Server
16+
17+
[!INCLUDE [applies-to-postgresql-flexible-server](../includes/applies-to-postgresql-flexible-server.md)]
18+
19+
A read replica can be created in the same region as the primary server and in a different one. Geo-replication can be helpful for scenarios like disaster recovery planning or bringing data closer to your users.
20+
21+
You can have a primary server in any [Azure Database for PostgreSQL flexible server region](https://azure.microsoft.com/global-infrastructure/services/?products=postgresql). A primary server can also have replicas in any global region of Azure that supports Azure Database for PostgreSQL flexible server. Additionally, we support special regions [Azure Government](../../azure-government/documentation-government-welcome.md) and [Microsoft Azure operated by 21Vianet](/azure/china/overview-operations). The special regions now supported are:
22+
23+
- **Azure Government regions**:
24+
- US Gov Arizona
25+
- US Gov Texas
26+
- US Gov Virginia
27+
28+
- **Microsoft Azure operated by 21Vianet regions**:
29+
- China North 3
30+
- China East 3
31+
32+
> [!NOTE]
33+
> [Virtual endpoints](concepts-read-replicas-virtual-endpoints.md) and [promote to primary server features](concepts-read-replicas-promote.md) - are not currently supported in the special regions listed above.
34+
35+
## Paired regions for disaster recovery purposes
36+
37+
While creating replicas in any supported region is possible, there are notable benefits when opting for replicas in paired regions, especially when architecting for disaster recovery purposes:
38+
39+
- **Region Recovery Sequence**: In a geography-wide outage, recovery of one region from every paired set is prioritized, ensuring that applications across paired regions always have a region expedited for recovery.
40+
41+
- **Sequential Updating**: Paired regions' updates are staggered chronologically, minimizing the risk of downtime from update-related issues.
42+
43+
- **Physical Isolation**: A minimum distance of 300 miles is maintained between data centers in paired regions, reducing the risk of simultaneous outages from significant events.
44+
45+
- **Data Residency**: With a few exceptions, regions in a paired set reside within the same geography, meeting data residency requirements.
46+
47+
- **Performance**: While paired regions typically offer low network latency, enhancing data accessibility and user experience, they might not always be the regions with the absolute lowest latency. If the primary objective is to serve data closer to users rather than prioritize disaster recovery, it's crucial to evaluate all available regions for latency. In some cases, a nonpaired region might exhibit the lowest latency. For a comprehensive understanding, you can reference [Azure's round-trip latency figures](../../networking/azure-network-latency.md#round-trip-latency-figures) to make an informed choice.
48+
49+
For a deeper understanding of the advantages of paired regions, refer to [Azure's documentation on cross-region replication](../../reliability/cross-region-replication-azure.md#azure-paired-regions).
50+
51+
52+
## Regional Failures and Recovery
53+
54+
Azure facilities across various regions are designed to be highly reliable. However, under rare circumstances, an entire region can become inaccessible due to reasons ranging from network failures to severe scenarios like natural disasters. Azure's capabilities allow for creating applications that are distributed across multiple regions, ensuring that a failure in one region doesn't affect others.
55+
56+
### Prepare for Regional Disasters
57+
58+
Being prepared for potential regional disasters is critical to ensure the uninterrupted operation of your applications and services. If you're considering a robust contingency plan for your Azure Database for PostgreSQL flexible server instance, here are the key steps and considerations:
59+
60+
1. **Establish a geo-replicated read replica**: It's essential to have a read replica set up in a separate region from your primary. This ensures continuity in case the primary region faces an outage.
61+
2. **Ensure server symmetry**: The "promote to primary server" action is the most recommended for handling regional outages, but it comes with a [server symmetry](concepts-read-replicas.md#configuration-management) requirement. This means both the primary and replica servers must have identical configurations of specific settings. The advantages of using this action include:
62+
* No need to modify application connection strings if you use [virtual endpoints](concepts-read-replicas-virtual-endpoints.md).
63+
* It provides a seamless recovery process where, once the affected region is back online, the original primary server automatically resumes its function, but in a new replica role.
64+
3. **Set up virtual endpoints**: Virtual endpoints allow for a smooth transition of your application to another region if there is an outage. They eliminate the need for any changes in the connection strings of your application.
65+
4. **Configure the read replica**: Not all settings from the primary server are replicated over to the read replica. It's crucial to ensure that all necessary configurations and features (for example, PgBouncer) are appropriately set up on your read replica. For more information, see the [Configuration management](concepts-read-replicas-promote.md#configuration-management) section.
66+
5. **Prepare for High Availability (HA)**: If your setup requires high availability, it won't be automatically enabled on a promoted replica. Be ready to activate it post-promotion. Consider automating this step to minimize downtime.
67+
6. **Regular testing**: Regularly simulate regional disaster scenarios to validate existing thresholds, targets, and configurations. Ensure that your application responds as expected during these test scenarios.
68+
7. **Follow Azure's general guidance**: Azure provides comprehensive guidance on [reliability and disaster preparedness](../../reliability/overview.md). It's highly beneficial to consult these resources and integrate best practices into your preparedness plan.
69+
70+
Being proactive and preparing in advance for regional disasters ensure the resilience and reliability of your applications and data.
71+
72+
### When outages impact your SLA
73+
74+
In the event of a prolonged outage with Azure Database for PostgreSQL flexible server in a specific region that threatens your application's service-level agreement (SLA), be aware that both the actions discussed below aren't service-driven. User intervention is required for both. It's a best practice to automate the entire process as much as possible and to have robust monitoring in place. For more information about what information is provided during an outage, see the [Service outage](concepts-business-continuity.md#service-outage) page. Only a **forced** promote is possible in a region down scenario, meaning the amount of data loss is roughly equal to the current lag between the replica and primary. Hence, it's crucial to [monitor the lag](concepts-read-replicas.md#monitor-replication). Consider the following steps:
75+
76+
**Promote to primary server**
77+
78+
This option won't require updating the connection strings in your application, provided virtual endpoints are configured. Once activated, the writer endpoint will repoint to the new primary in a different region and the [replication state](concepts-read-replicas.md#monitor-replication) column in the Azure portal will display "Reconfiguring". Once the affected region is restored, the former primary server will automatically resume, but now in a replica role.
79+
80+
**Promote to independent server and remove from replication**
81+
82+
In that case, this is the only viable option. After promoting the server, you'll need to update your application's connection strings. Once the original region is restored, the old primary might become active again. Ensure to remove it to avoid incurring unnecessary costs. If you wish to maintain the previous topology, recreate the read replica.
83+
84+
85+
## Related content
86+
87+
- [Read replicas - overview](concepts-read-replicas.md)
88+
- [Promote read replicas](concepts-read-replicas-promote.md)
89+
- [Virtual endpoints](concepts-read-replicas-virtual-endpoints.md)
90+
- [Create and manage read replicas in the Azure portal](how-to-read-replicas-portal.md)
91+
- [Cross-region replication with virtual network](concepts-networking.md#replication-across-azure-regions-and-virtual-networks-with-private-networking)
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
---
2+
title: Promote read replicas
3+
description: This article describes the promote action for read replica feature in Azure Database for PostgreSQL - Flexible Server.
4+
author: AlicjaKucharczyk
5+
ms.author: alkuchar
6+
ms.reviewer: maghan
7+
ms.date: 03/06/2024
8+
ms.service: postgresql
9+
ms.subservice: flexible-server
10+
ms.topic: conceptual
11+
---
12+
13+
# Promote read replicas in Azure Database for PostgreSQL - Flexible Server
14+
15+
[!INCLUDE [applies-to-postgresql-flexible-server](../includes/applies-to-postgresql-flexible-server.md)]
16+
17+
Promote refers to the process where a replica is commanded to end its replica mode and transition into full read-write operations.
18+
19+
> [!IMPORTANT]
20+
> Promote operation is not automatic. In the event of a primary server failure, the system won't switch to the read replica independently. An user action is always required for the promote operation.
21+
22+
Promotion of replicas can be done in two distinct manners:
23+
24+
**Promote to primary server**
25+
26+
This action elevates a replica to the role of the primary server. In the process, the current primary server is demoted to a replica role, swapping their roles. For a successful promotion, it's necessary to have a [virtual endpoint](concepts-read-replicas-promote.md) configured for both the current primary as the writer endpoint, and the replica intended for promotion as the reader endpoint. The promotion is successful only if the targeted replica is included in the reader endpoint configuration.
27+
28+
The diagram illustrates the configuration of the servers before the promotion and the resulting state after the promotion operation is successfully completed.
29+
30+
:::image type="content" source="./media/concepts-read-replica/promote-to-primary-server.png" alt-text="Diagram that shows promote to primary server operation." lightbox="./media/concepts-read-replica/promote-to-primary-server.png":::
31+
32+
**Promote to independent server and remove from replication**
33+
34+
When you choose this option, the replica is promoted to become an independent server and is removed from the replication process. As a result, both the primary and the promoted server function as two independent read-write servers. It should be noted that while virtual endpoints can be configured, they aren't a necessity for this operation. The newly promoted server is no longer part of any existing virtual endpoints, even if the reader endpoint was previously pointing to it. Thus, it's essential to update your application's connection string to direct to the newly promoted replica if the application should connect to it.
35+
36+
The diagram illustrates the configuration of the servers before the promotion and the resulting state after the promotion to independent server operation is successfully completed.
37+
38+
:::image type="content" source="./media/concepts-read-replica/promote-to-independent-server.png" alt-text="Diagram that shows promote to independent server and remove from replication operation." lightbox="./media/concepts-read-replica/promote-to-independent-server.png":::
39+
40+
> [!IMPORTANT]
41+
> The **Promote to independent server and remove from replication** action is backward compatible with the previous promote functionality.
42+
43+
> [!IMPORTANT]
44+
> **Server Symmetry**: For a successful promotion using the promote to primary server operation, both the primary and replica servers must have identical tiers and storage sizes. For instance, if the primary has 2vCores and the replica has 4vCores, the only viable option is to use the "promote to independent server and remove from replication" action. Additionally, they need to share the same values for [server parameters that allocate shared memory](concepts-read-replicas.md#server-parameters).
45+
46+
For both promotion methods, there are more options to consider:
47+
48+
- **Planned**: This option ensures that data is synchronized before promoting. It applies all the pending logs to ensure data consistency before accepting client connections.
49+
50+
- **Forced**: This option is designed for rapid recovery in scenarios such as regional outages. Instead of waiting to synchronize all the data from the primary, the server becomes operational once it processes WAL files needed to achieve the nearest consistent state. If you promote the replica using this option, the lag at the time you delink the replica from the primary indicates how much data is lost.
51+
52+
> [!IMPORTANT]
53+
> The **Forced** promotion option is specifically designed to address regional outages and, in such cases, it skips all checks - including the server symmetry requirement - and proceeds with promotion. This is because it prioritizes immediate server availability to handle disaster scenarios. However, using the Forced option outside of region down scenarios is not allowed if the requirements for read replicas specified in the documentation, especially server symmetry requirement, are not met, as it could lead to issues such as broken replication.
54+
55+
56+
Learn how to [promote replica to primary](how-to-read-replicas-portal.md#promote-replicas) and [promote to independent server and remove from replication](how-to-read-replicas-portal.md#promote-replica-to-independent-server).
57+
58+
## Configuration management
59+
60+
Read replicas are treated as separate servers in terms of control plane configurations. This approach provides flexibility for read scale scenarios. However, when using replicas for disaster recovery purposes, users must ensure the configuration is as desired.
61+
62+
The promote operation won't carry over specific configurations and parameters. Here are some of the notable ones:
63+
64+
- **PgBouncer**: [The built-in PgBouncer](concepts-pgbouncer.md) connection pooler's settings and status aren't replicated during the promotion process. If PgBouncer was enabled on the primary but not on the replica, it will remain disabled on the replica after promotion. Should you want PgBouncer on the newly promoted server, you must enable it either prior to or following the promotion action.
65+
- **Geo-redundant backup storage**: Geo-backup settings aren't transferred. Since replicas can't have geo-backup enabled, the promoted primary (formerly the replica) won't have it post-promotion. The feature can only be activated at the standard server's creation time (not a replica).
66+
- **Server Parameters**: If their values differ on the primary and read replica, they won't be changed during promotion. It's essential to note that parameters influencing shared memory size must have the same values on both the primary and replicas. This requirement is detailed in the [Server parameters](concepts-read-replicas.md#server-parameters) section.
67+
- **Microsoft Entra authentication**: If the primary had [Microsoft Entra authentication](concepts-azure-ad-authentication.md) configured, but the replica was set up with PostgreSQL authentication, then after promotion, the replica won't automatically switch to Microsoft Entra authentication. It retains the PostgreSQL authentication. Users need to manually configure Microsoft Entra authentication on the promoted replica either before or after the promotion process.
68+
- **High Availability (HA)**: Should you require [HA](concepts-high-availability.md) after the promotion, it must be configured on the freshly promoted primary server, following the role reversal.
69+
70+
71+
## Considerations
72+
### Server states during promotion
73+
74+
In both the Planned and Forced promotion scenarios, it's required that servers (both primary and replica) be in an "Available" state. If a server's status is anything other than "Available" (such as "Updating" or "Restarting"), the promotion typically can't proceed without issues. However, an exception is made in the case of regional outages.
75+
76+
During such regional outages, the Forced promotion method can be implemented regardless of the server's current status. This approach allows for swift action in response to potential regional disasters, bypassing normal checks on server availability.
77+
78+
It's important to note that if the former primary server enters an irrecoverable state during promotion of its replica, the only solution is to delete the former primary server and recreate the replica server.
79+
80+
### Multiple replicas visibility during promotion in nonpaired regions
81+
82+
When dealing with multiple replicas and if the primary region lacks a [paired region](concepts-read-replicas-geo.md# paired-regions-for-disaster-recovery-purposes), a special consideration must be considered. In the event of a regional outage affecting the primary, any other replicas won't be automatically recognized by the newly promoted replica. While applications can still be directed to the promoted replica for continued operation, the unrecognized replicas remain disconnected during the outage. These extra replicas will only reassociate and resume their roles once the original primary region has been restored.
83+
84+
## Frequently asked questions
85+
86+
* **If I have an HA-enabled primary and a read replica, and I promote the replica, then switch back to the original primary, will the server still be in HA?**
87+
88+
No, we disable HA during the initial promotion since we do not support HA-enabled read replicas. Promoting a read replica to a primary means that the original primary is changing its role to a replica. If you are switching back, you will need to enable HA on your original primary server.
89+
90+
## Related content
91+
92+
- [Read replicas - overview](concepts-read-replicas.md)
93+
- [Geo-replication](concepts-read-replicas-geo.md)
94+
- [Virtual endpoints](concepts-read-replicas-virtual-endpoints.md)
95+
- [Create and manage read replicas in the Azure portal](how-to-read-replicas-portal.md)
96+
- [Cross-region replication with virtual network](concepts-networking.md#replication-across-azure-regions-and-virtual-networks-with-private-networking)

0 commit comments

Comments
 (0)