Skip to content

Commit bdb2c3e

Browse files
authored
Update scalability-overview.md
Updating blocking and non-blocking feedback on the docs PR.
1 parent eb1d32d commit bdb2c3e

File tree

1 file changed

+7
-8
lines changed

1 file changed

+7
-8
lines changed

articles/cosmos-db/mongodb/vcore/scalability-overview.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Scalability Overview
2+
title: Scalability overview
33
titleSuffix: Overview of compute and storage scalability on Azure Cosmos DB for MongoDB vCore
44
description: Cost and performance advantages of scalability for Azure Cosmos DB for MongoDB vCore
55
author: abinav2307
@@ -14,7 +14,7 @@ ms.date: 07/22/2024
1414

1515
The vCore based service for Azure Cosmos DB for MongoDB offers the ability to scale clusters both vertically and horizontally. While the Compute cluster tier and Storage disk functionally depend on each other, the scalability and cost of compute and storage are decoupled.
1616

17-
## Vertical Scaling
17+
## Vertical scaling
1818
Vertical scaling offers the following benefits:
1919
- Application teams may not always have a clear path to logically shard their data. Moreover, logical sharding is defined per collection. In a dataset with several unsharded collections, data modeling to partition the data can quickly become tedious. Simply scaling up the cluster can circumvent the need for logical sharding while meeting the growing storage and compute needs of the application.
2020
- Vertical scaling does not require data rebalancing. The number of physical shards remains the same and only the capacity of the cluster is increased with no impact to the application.
@@ -23,14 +23,14 @@ Vertical scaling offers the following benefits:
2323
- Most importantly, Compute and Storage can be scaled independently. If more cores and memory are needed, the disk SKU can be left as is and the cluster tier can be scaled up. Equally, if more storage and IOPS are needed, the cluster tier can be left as is and the Storage SKU can be scaled up independently. If needed, both Compute and Storage can be scaled independently to optimize for each component's requirements individually, without either component's elasticity requirements affecting the other.
2424

2525

26-
## Horizontal Scaling
26+
## Horizontal scaling
2727
Eventually, the application grows to a point where scaling vertically is not sufficient. Workload requirements can grow beyond the capacity of the largest cluster tier and eventually more shards are needed. Horizontal scaling in the vCore based offering for Azure Cosmos DB for MongoDB offers the following benefits:
28-
- Logically sharded datasets do not require user intervention to balance data across the underlying physical shards. The service automatically maps logical shards to physical shards. When nodes are added or removed, data is automatically rebalanaced the database under the covers.
28+
- Logically sharded datasets do not require user intervention to balance data across the underlying physical shards. The service automatically maps logical shards to physical shards. When nodes are added or removed, data is automatically rebalanced the database under the covers.
2929
- Requests are automatically routed to the relevant physical shard that owns the hash range for the data being queried.
3030
- Geo-distributed clusters have a homogeneous multi-node configuration. Thus logical to physical shard mappings are consistent across the primary and replica regions of a cluster.
3131

3232

33-
## Compute and Storage scaling
33+
## Compute and storage scaling
3434
Compute and memory resources influence read operations in the vCore based service for Azure Cosmos DB for MongoDB more than disk IOPS.
3535
- Read operations first consult the cache in the compute layer and fall back to the disk when data could not be retrieved from the cache. For workloads with a higher rate of read operations per second, scaling up the cluster tier to get more CPU and memory resources leads to higher throughput.
3636
- In addition to read throughput, workloads with a high volume of data per read operation also benefit from scaling the compute resources of the cluster. For instance, cluster tiers with more memory facilitate larger payload sizes per document and a larger number of smaller documents per response.
@@ -45,17 +45,16 @@ Disk IOPS influences write operations in the vCore based service for Azure Cosmo
4545
As mentioned earlier, storage and compute resources are decoupled for billing and provisioning. While they function as a cohesive unit, they can be scaled independently. The M30 cluster tier can have 32 TB disks provisioned. Similarly, the M200 cluster tier can have 32 GB disks provisioned to optimize for both storage and compute costs.
4646

4747
### Lower TCO with large disks (32 TB and beyond)
48-
Typically, NoSQL databases with a vCore based model limit the storage per physical shard to 4 TB. The vCore based service for Azure Cosmos DB for MongoDB provides upto 8x that capacity with 32 TB disks and plans to expand to 64 TB and 128 TB disks per shard soon. For storage heavy workloads, a 4 TB storage capacity per physical shard requires a massive fleet of compute resources just to sustain the storage requirements of the workload. Compute is more expensive than storage and over provisioning compute due to capacity limits in a service can inflate costs rapidly.
48+
Typically, NoSQL databases with a vCore based model limit the storage per physical shard to 4 TB. The vCore based service for Azure Cosmos DB for MongoDB provides upto 8x that capacity with 32 TB disks. For storage heavy workloads, a 4 TB storage capacity per physical shard requires a massive fleet of compute resources just to sustain the storage requirements of the workload. Compute is more expensive than storage and over provisioning compute due to capacity limits in a service can inflate costs rapidly.
4949

5050
Let's consider a storage heavy workload with 200 TB of data.
5151

5252
| Storage size per shard | Min shards needed to sustain 200 TB |
5353
|------------------------|-------------------------------------|
5454
| 4 TB | 50 |
5555
| 32 TiB | 7 |
56-
| 64 TiB (Coming soon) | 4 |
5756

58-
The reduction in Compute requirements reduces sharply with larger disks. While more than the minimum number of physical shards may be needed sustain the throughput requirements of the workload, even doubling or tripling the number of shards are more cost effective than a 50 shard cluster with smaller disks.
57+
The reduction in Compute requirements reduces sharply with larger disks. While more than the minimum number of physical shards may be needed to sustain the throughput requirements of the workload, even doubling or tripling the number of shards are more cost effective than a 50 shard cluster with smaller disks.
5958

6059
### Skip storage tiering with large disks
6160
An immediate response to compute costs in storage heavy scenarios is to "tier" the data. Data in the transactional database is limited to the most frequently accessed "hot" data while the larger volume of "cold" data is detached to a cold store. This causes operational complexity. Performance is also unpredictable and dependent upon the data tier that is accessed. Furthermore, the availability of the entire system is dependent on the resiliency of both the hot and cold data stores combined. With large disks in the vCore service, there is no need for tiered storage as the cost of storage heavy workloads is minimized.

0 commit comments

Comments
 (0)