Skip to content

Commit bbce43f

Browse files
authored
Merge pull request #278226 from niklarin/vcore-ha
High availability concept page
2 parents b7aac7d + 1817638 commit bbce43f

File tree

3 files changed

+52
-5
lines changed

3 files changed

+52
-5
lines changed

articles/cosmos-db/.openpublishing.redirection.cosmos-db.json

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5785,11 +5785,6 @@
57855785
"redirect_url": "/previous-versions/azure/cosmos-db/nosql/sdk-java-spark-v2",
57865786
"redirect_document_id": false
57875787
},
5788-
{
5789-
"source_path_from_root": "/articles/cosmos-db/mongodb/vcore/high-availability.md",
5790-
"redirect_url": "/azure/reliability/reliability-cosmos-mongodb",
5791-
"redirect_document_id": false
5792-
},
57935788
{
57945789
"source_path_from_root": "/articles/cosmos-db/mongodb/vcore/failover-disaster-recovery.md",
57955790
"redirect_url": "/azure/reliability/reliability-cosmos-mongodb",

articles/cosmos-db/mongodb/vcore/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@
3636
href: vector-search-ai.md
3737
- name: MongoDB feature support
3838
href: compatibility.md
39+
- name: High availability (HA)
40+
href: high-availability.md
3941
- name: Cross-region replication
4042
href: cross-region-replication.md
4143
- name: Reliability
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
title: High availability in Azure Cosmos DB for MongoDB vCore
3+
titleSuffix: Azure Cosmos DB for MongoDB vCore
4+
description: Learn about availability (HA) of Azure Cosmos DB for MongoDB vCore clusters.
5+
author: niklarin
6+
ms.author: nlarin
7+
ms.service: cosmos-db
8+
ms.subservice: mongodb-vcore
9+
ms.topic: conceptual
10+
ms.date: 06/12/2024
11+
---
12+
13+
# High availability in Azure Cosmos DB for MongoDB vCore
14+
15+
[!INCLUDE[MongoDB vCore](~/reusable-content/ce-skilling/azure/includes/cosmos-db/includes/appliesto-mongodb-vcore.md)]
16+
17+
High availability (HA) avoids database downtime by maintaining standby replicas
18+
of every shard in a cluster. If a shard becomes unresponsive for any reason, Azure Cosmos DB for MongoDB vCore
19+
switches incoming connections from the failed shard to its standby. When failover
20+
happens promoted shards always have fresh data through synchronous replication.
21+
22+
All primary shards in a cluster are provisioned into one [availability zone (AZ)](../../../reliability/availability-zones-overview.md)
23+
for better latency between the shards. The standby shards are provisioned into
24+
another availability zone.
25+
26+
Even without HA enabled, each node has its own locally
27+
redundant storage (LRS) with three synchronous replicas maintained by Azure
28+
Storage service. All three replicas are located in the cluster's Azure region. If there's a single replica failure, Azure Storage service detects it and transparently re-creates failed replica. See metrics [on this page](../../../storage/common/storage-redundancy.md#summary-of-redundancy-options) for LRS storage durability.
29+
30+
When HA *is* enabled, Azure Cosmos DB for MongoDB vCore runs one standby shard for each primary
31+
shard in the cluster. Each primary and standby shard has the same compute and storage configuration.
32+
The primary and its standby use synchronous replication. This type of replication allows you to always have
33+
the same data on the primary and standby shards in your cluster. In a nutshell, our service detects a failure
34+
on primary shards, and fails over to standby nodes with zero data loss.
35+
36+
The cluster connection string always stays the same regardless of failovers. That allows the service to abstract changes in physical shards serving requests from applications.
37+
38+
High availability can be enabled at cluster creation time. High availability can also be [enabled and disabled at any time on an existing Azure Cosmos DB for MongoDB vCore cluster](./how-to-scale-cluster.md#enable-or-disable-high-availability). There's no database downtime when high availability is enabled or disabled on an Azure Cosmos DB for MongoDB vCore cluster.
39+
40+
## What happens during a failover
41+
Each shard failover consists of three phases: Unavailability detection, switch to the standby shard, and re-creation of the standby shard. The service performs ongoing monitoring of availability for each primary and standby shard in the cluster by doing periodic health check. When health check reliably indicates that shard became unresponsive and needs to be declared failed, actual failover (switch) to the standby shard is initiated.
42+
43+
During the switch phase, database reads and writes are redirected to the standby shard. Synchronous replication between each primary and standby shard ensures that the standby shard always have the same set of data as its primary. That allows all failovers to be performed with zero data loss. The switch to standby is done with no downtime for reads. Write operations may require internal service retries during the switch phase. These retries might be seen as write slowness on the application side.
44+
45+
Once the shard failover is completed, the cluster is fully operational. The last step to return to the original highly available configuration is to re-create the standby shard. This standby shard re-creation is performed without downtime or performance impact on the primary shard.
46+
47+
## Related content
48+
49+
- [See how to enable high availability in Azure Cosmos DB for MongoDB vCore](./how-to-scale-cluster.md#enable-or-disable-high-availability)
50+
- [Learn about reliability fundamentals in Azure Cosmos DB for MongoDB vCore](../../../reliability/reliability-cosmos-mongodb.md)

0 commit comments

Comments
 (0)