You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cosmos-db/mongodb/vcore/partitioning.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,7 @@ For example, let's consider a collection called Employees with the document stru
26
26
27
27
This table shows a mapping of shard key values to logical partitions.
28
28
29
-
| Document Id| Shard Key Value | Logical Shard |
29
+
| Document ID| Shard Key Value | Logical Shard |
30
30
|-------------|-----------------|-------------- |
31
31
| "12345" | "Steve Smith" | Shard 1 |
32
32
| "23456" | "Jane Doe" | Shard 2 |
@@ -38,7 +38,7 @@ This table shows a mapping of shard key values to logical partitions.
38
38
39
39
- There are also no limits to the size of a single logical shard.
40
40
41
-
- In addition, the service does not limit transactions to the scope of a logical shard. The vCore based service for Azure Cosmos DB for MongoDB supports read and write transactions that are applicable across multiple logical shards and across multiple physical shards in the cluster.
41
+
- In addition, the service doesn't limit transactions to the scope of a logical shard. The vCore based service for Azure Cosmos DB for MongoDB supports read and write transactions that are applicable across multiple logical shards and across multiple physical shards in the cluster.
42
42
43
43
## Physical shards
44
44
Physical shards are the underlying machines and disks responsible for persisting the data and fulfilling database transactions. Unlike logical shards, the service manages physical shards under the covers.
@@ -48,11 +48,11 @@ The number of physical shards are defined when a cluster is created. Single shar
48
48
### Mapping logical shards to physical shards
49
49
When new logical shards are added, the cluster seamlessly updates the mapping of logical to physical shards. Similarly, the assignment of the address space to each physical shard is changed as new physical shards are added to the cluster after which, logical shards are rebalanced across the cluster.
50
50
51
-
The hash range used to map logical and physical shards is evenly distributed across the physical shards in the cluster. Each physical shard owns an evenly sized bucket of the hash range. For every document that is written, he value of the shard key property is hashed and the hash value determines the mapping of the document to the underlying physical shard. Internally, several logical shards map to a single physical shard. Moreover, logical shards are never split across physical shards and all the documents for a logical shard only map to one physical shard.
51
+
The hash range used to map logical and physical shards is evenly distributed across the physical shards in the cluster. Each physical shard owns an evenly sized bucket of the hash range. For every document that is written, the value of the shard key property is hashed and the hash value determines the mapping of the document to the underlying physical shard. Internally, several logical shards map to a single physical shard. Moreover, logical shards are never split across physical shards and all the documents for a logical shard only map to one physical shard.
52
52
53
-
Building on the prior example using a cluster with 2 physical shards, this table shows a sample mapping of documents to physical shards.
53
+
Building on the prior example using a cluster with two physical shards, this table shows a sample mapping of documents to physical shards.
@@ -64,19 +64,19 @@ Building on the prior example using a cluster with 2 physical shards, this table
64
64
### Capacity of physical shards
65
65
The cluster tier that is selected when the cluster is provisioned determines the CPU and memory capacity of a physical shard. Similarly the storage SKU determines the storage and IOPS capacity of a physical shard. Larger cluster tiers provide more compute power and larger memory while larger storage disks provide more storage and IOPS. Read heavy workloads benefit from a larger cluster tier while write heavy workloads benefit from a larger storage SKU. The cluster tier can be scaled up and down after the cluster is created based on the changing needs of the application.
66
66
67
-
In a multi-shard cluster, the capacity of each physical shard is the same. Scaling up the cluster tier or the storage SKU does not change the placement of logical shards on the physical shards. After a scale up operation, the number of physical shards remains the same thus avoiding the need to relabance the data in the cluster.
67
+
In a multi-shard cluster, the capacity of each physical shard is the same. Scaling up the cluster tier or the storage SKU doesn't change the placement of logical shards on the physical shards. After a scale up operation, the number of physical shards remains the same thus avoiding the need to rebalance the data in the cluster.
68
68
69
-
The compute, memory, storage and IOPS capacity of the physical shard The logical shards that are co-located within the same physical shard are collectively bound by the compute, memory, storage and IOPS capacity of the physical shard they are mapped to. Shard keys that do not have an even distribution of storage and request volumes can cause uneven storage and throughput consumption within the cluster. Hot partitions can cause physical shards to be unevenly utilized leading to unpredictable throughput and performance. Thus sharded clusters require careful planning upfront to ensure performance remains consistent as the requirements of the application change over time.
69
+
The compute, memory, storage, and IOPS capacity of the physical shard determine the resources available for the logical shards. Shard keys that don't have an even distribution of storage and request volumes can cause uneven storage and throughput consumption within the cluster. Hot partitions can cause physical shards to be unevenly utilized leading to unpredictable throughput and performance. Thus sharded clusters require careful planning upfront to ensure performance remains consistent as the requirements of the application change over time.
70
70
71
71
72
72
### Replica sets
73
73
Each physical shard consists of a set of replicas, also referred to as a replica set. Each replica hosts an instance of the database engine. A replica set makes the data store within the physical shard durable, highly available, and consistent. Each replica that makes up the physical shard inherits the partition's storage and compute capacity. Azure Cosmos DB for MongoDB vCore automatically manages replica sets.
74
74
75
75
76
76
## Best practices for sharding data
77
-
- Sharding in Azure Cosmos DB for MongoDB vCore is not required unless the collection's storage and transaction volumes can exceed the capacity of a single physical shard. For instance, the service provides 32 TB disks per shard. If a collection requires more than 32 TB, it should be sharded.
77
+
- Sharding in Azure Cosmos DB for MongoDB vCore isn't required unless the collection's storage and transaction volumes can exceed the capacity of a single physical shard. For instance, the service provides 32 TB disks per shard. If a collection requires more than 32 TB, it should be sharded.
78
78
79
-
- It is not necessary to shard every collection in a cluster with multiple physical shards. Sharded and unsharded collections can coexist in the same cluster. The service optimally distributes the collections within the cluster to evenly utilize the cluster's compute and storage resources as evenly as possible.
79
+
- It isn't necessary to shard every collection in a cluster with multiple physical shards. Sharded and unsharded collections can coexist in the same cluster. The service optimally distributes the collections within the cluster to evenly utilize the cluster's compute and storage resources as evenly as possible.
80
80
81
81
- For read heavy applications, the shard key must be selected based on the most frequent query patterns. The most commonly used query filter for a collection should be chosen as the shard key to optimize the highest percentage of database transactions by localizing the search to a single physical shard.
0 commit comments