Skip to content

Commit b558d76

Browse files
committed
Adding a new page on connectivity resilience
1 parent c8605dd commit b558d76

File tree

2 files changed

+46
-0
lines changed

2 files changed

+46
-0
lines changed

articles/postgresql/TOC.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -540,6 +540,10 @@
540540
- name: Logical replication and logical decoding
541541
href: flexible-server/concepts-logical.md
542542
displayName: logical decoding, logical replication
543+
- name: App Development
544+
items:
545+
- name: Connection resilience
546+
href: flexible-server/concepts-connectivity.md
543547
- name: Azure Advisor recommendations
544548
href: flexible-server/concepts-azure-advisor-recommendations.md
545549
- name: Troubleshooting
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
title: Handle transient connectivity errors - Azure Database for PostgreSQL - Flexible Server
3+
description: Learn how to handle transient connectivity errors for Azure Database for PostgreSQL - Flexible Server.
4+
ms.author: olmoloce
5+
author: olmoloce
6+
ms.service: postgresql
7+
ms.subservice: flexible-server
8+
ms.topic: conceptual
9+
ms.date: 03/22/2023
10+
---
11+
12+
# Handling transient connectivity errors for Azure Database for PostgreSQL - Flexible Server
13+
14+
[!INCLUDE [applies-to-postgresql-flexible-server](../includes/applies-to-postgresql-flexible-server.md)]
15+
16+
This article describes how to handle transient errors connecting to Azure Database for PostgreSQL.
17+
18+
## Transient errors
19+
20+
A transient error, also known as a transient fault, is an error that will resolve itself. Most typically these errors manifest as a connection to the database server being dropped. Also new connections to a server can't be opened. Transient errors can occur for example when hardware or network failure happens. Another reason could be a new version of a PaaS service that is being rolled out. Most of these events are automatically mitigated by the system in less than 60 seconds. A best practice for designing and developing applications in the cloud is to expect transient errors. Assume they can happen in any component at any time and to have the appropriate logic in place to handle these situations.
21+
22+
## Handling transient errors
23+
24+
Transient errors should be handled using retry logic. Situations that must be considered:
25+
26+
* An error occurs when you try to open a connection
27+
* An idle connection is dropped on the server side. When you try to issue a command, it can't be executed
28+
* An active connection that currently is executing a command is dropped.
29+
30+
The first and second cases are fairly straight forward to handle. Try to open the connection again. When you succeed, the transient error has been mitigated by the system. You can use your Azure Database for PostgreSQL again. We recommend having waits before retrying the connection. Back off if the initial retries fail. This way the system can use all resources available to overcome the error situation. A good pattern to follow is:
31+
32+
* Wait for 5 seconds before your first retry.
33+
* For each following retry, the increase the wait exponentially, up to 60 seconds.
34+
* Set a max number of retries at which point your application considers the operation failed.
35+
36+
When a connection with an active transaction fails, it is more difficult to handle the recovery correctly. There are two cases: If the transaction was read-only in nature, it is safe to reopen the connection and to retry the transaction. If however if the transaction was also writing to the database, you must determine if the transaction was rolled back, or if it succeeded before the transient error happened. In that case, you might just not have received the commit acknowledgment from the database server.
37+
38+
One way of doing this, is to generate a unique ID on the client that is used for all the retries. You pass this unique ID as part of the transaction to the server and to store it in a column with a unique constraint. This way you can safely retry the transaction. It will succeed if the previous transaction was rolled back and the client generated unique ID does not yet exist in the system. It will fail indicating a duplicate key violation if the unique ID was previously stored because the previous transaction completed successfully.
39+
40+
When your program communicates with Azure Database for PostgreSQL through third-party middleware, ask the vendor whether the middleware contains retry logic for transient errors.
41+
42+
Make sure to test your retry logic. For example, try to execute your code while scaling up or down the compute resources of your Azure Database for PostgreSQL server. Your application should handle the brief downtime that is encountered during this operation without any problems.

0 commit comments

Comments
 (0)