You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cosmos-db/partition-data.md
+30-11Lines changed: 30 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,37 +1,56 @@
1
1
---
2
2
title: Partitioning and horizontal scaling in Azure Cosmos DB
3
3
description: Learn about how partitioning works in Azure Cosmos DB, how to configure partitioning and partition keys, and how to choose the right partition key for your application.
4
-
author: markjbrown
5
-
ms.author: mjbrown
4
+
author: deborahc
5
+
ms.author: dech
6
6
ms.service: cosmos-db
7
7
ms.topic: conceptual
8
-
ms.date: 08/01/2019
8
+
ms.date: 04/28/2020
9
9
10
10
---
11
11
12
12
# Partitioning and horizontal scaling in Azure Cosmos DB
13
13
14
-
This article explains physical and logical partitions in Azure Cosmos DB. It also discusses best practices for scaling and partitioning.
14
+
This article explains the relationship between logical and physical partitions. It also discusses best practices for partitioning and gives an in-depth view at how horizontal scaling works in Azure Cosmos DB. It's not necessary to understand these internal details to [select your partition key](partitioning-overview.md#choose-partitionkey) but we have covered them so you have clarity for how Azure Cosmos DB works.
15
15
16
16
## Logical partitions
17
17
18
-
A logical partition consists of a set of items that have the same partition key. For example, in a container where all items contain a `City` property, you can use `City` as the partition key for the container. Groups of items that have specific values for `City`, such as `London`, `Paris`, and `NYC`, form distinct logical partitions. You don't have to worry about deleting a partition when the underlying data is deleted.
19
-
20
-
In Azure Cosmos DB, a container is the fundamental unit of scalability. Data that's added to the container and the throughput that you provision on the container are automatically (horizontally) partitioned across a set of logical partitions. Data and throughput are partitioned based on the partition key you specify for the Azure Cosmos container. For more information, see [Create an Azure Cosmos container](how-to-create-container.md).
18
+
A logical partition consists of a set of items that have the same partition key. For example, in a container that contains data about food nutrition, all items contain a `foodGroup` property. You can use `foodGroup` as the partition key for the container. Groups of items that have specific values for `foodGroup`, such as `Beef Products`,`Baked Products`, and `Sausages and Luncheon Meats`, form distinct logical partitions. You don't have to worry about deleting a logical partition when the underlying data is deleted.
21
19
22
20
A logical partition also defines the scope of database transactions. You can update items within a logical partition by using a [transaction with snapshot isolation](database-transactions-optimistic-concurrency.md). When new items are added to a container, new logical partitions are transparently created by the system.
23
21
22
+
There is no limit to the number of logical partitions in your container. Each logical partition can store up to 20GB of data. Good partition key choices have a wide range of possible values. For example, in a container where all items contain a `foodGroup`property, the data within the `Beef Products` logical partition can grow up to 20GB. [Selecting a partition key](partitioning-overview.md#choose-partitionkey) with a wide range of possible values ensures that the container is able to scale.
23
+
24
24
## Physical partitions
25
25
26
-
An Azure Cosmos container is scaled by distributing data and throughput across a large number of logical partitions. Internally, one or more logical partitions are mapped to a physical partition that consists of a set of replicas, also referred to as a [*replica set*](global-dist-under-the-hood.md). Each replica set hosts an instance of the Azure Cosmos database engine. A replica set makes the data stored within the physical partition durable, highly available, and consistent. A physical partition supports the maximum amount of storage and request units (RUs). Each replica that makes up the physical partition inherits the partition's storage quota. All replicas of a physical partition collectively support the throughput that's allocated to the physical partition.
26
+
An Azure Cosmos container is scaled by distributing data and throughput across physical partitions. Internally, one or more logical partitions are mapped to a single physical partition. Most small Cosmos containers have many logical partitions but only require a single physical partition. Unlike logical partitions, physical partitions are an internal implementation of the system and they are entirely managed by Azure Cosmos DB.
27
27
28
-
The following image shows how logical partitions are mapped to physical partitions that are distributed globally:
28
+
The number of physical partitions in your Cosmos container depends on the following:
29
29
30
-

30
+
- Amount of provisioned throughput (each individual physical partition can provide a throughput of up to 10,000 request units per second)
31
+
- Total data storage (each individual physical partition can store up to 50GB)
32
+
33
+
There is no limit to the total number of physical partitions in your container. As your provisioned throughput or data size grows, Azure Cosmos DB will automatically create new physical partitions by splitting existing ones. Physical partition splits do not impact your application's availability. After the physical partition split, all data within a single logical partition will still be stored on the same physical partition. A physical partition split simply creates a new mapping of logical partitions to physical partitions.
31
34
32
35
Throughput provisioned for a container is divided evenly among physical partitions. A partition key design that doesn't distribute the throughput requests evenly might create "hot" partitions. Hot partitions might result in rate-limiting and in inefficient use of the provisioned throughput, and higher costs.
33
36
34
-
Unlike logical partitions, physical partitions are an internal implementation of the system. You can't control the size, placement, or count of physical partitions, and you can't control the mapping between logical partitions and physical partitions. However, you can control the number of logical partitions and the distribution of data, workload and throughput by [choosing the right logical partition key](partitioning-overview.md#choose-partitionkey).
37
+
You can see your container's physical partitions in the **Storage** section of the **Metrics blade** of the Azure portal:
38
+
39
+
[](./media/partition-data/view-partitions-zoomed-in.png#lightbox)
40
+
41
+
In this example container where we have chosen `/foodGroup` as our partition key, each of the three rectangles represents a physical partition. In the image, **partition key range** is the same as a physical partition. The selected physical partition contains three logical partitions: `Beef Products`, `Vegetable and Vegetable Products`, and `Soups, Sauces, and Gravies`.
42
+
43
+
If we provision a throughput of 18,000 request units per second (RU/s), then each of the three physical partition can utilize 1/3 of the total provisioned throughput. Within the selected physical partition, the logical partition keys `Beef Products`, `Vegetable and Vegetable Products`, and `Soups, Sauces, and Gravies` can, collectively, utilize the physical partition's 6,000 provisioned RU/s. Because provisioned throughput is evenly divided across your container's physical partitions, it's important to choose a partition key that evenly distributes throughput consumption by [choosing the right logical partition key](partitioning-overview.md#choose-partitionkey). If you choose a partition key that evenly distributes throughput consumption across logical partitions, you will ensure that throughput consumption across physical partitions is balanced.
44
+
45
+
## Replica sets
46
+
47
+
Each physical partition consists of a set of replicas, also referred to as a [*replica set*](global-dist-under-the-hood.md). Each replica set hosts an instance of the Azure Cosmos database engine. A replica set makes the data stored within the physical partition durable, highly available, and consistent. Each replica that makes up the physical partition inherits the partition's storage quota. All replicas of a physical partition collectively support the throughput that's allocated to the physical partition. Azure Cosmos DB automatically manages replica sets.
48
+
49
+
Most small Cosmos containers only require a single physical partition but will still have at least 4 replicas.
50
+
51
+
The following image shows how logical partitions are mapped to physical partitions that are distributed globally:
52
+
53
+

Copy file name to clipboardExpand all lines: articles/cosmos-db/partitioning-overview.md
+43-14Lines changed: 43 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,11 @@
1
1
---
2
2
title: Partitioning in Azure Cosmos DB
3
3
description: Learn about partitioning in Azure Cosmos DB, best practices when choosing a partition key, and how to manage logical partitions
4
-
author: markjbrown
5
-
ms.author: mjbrown
4
+
author: deborahc
5
+
ms.author: dech
6
6
ms.service: cosmos-db
7
7
ms.topic: conceptual
8
-
ms.date: 12/02/2019
8
+
ms.date: 04/28/2020
9
9
10
10
---
11
11
@@ -15,33 +15,62 @@ Azure Cosmos DB uses partitioning to scale individual containers in a database t
15
15
16
16
For example, a container holds items. Each item has a unique value for the `UserID` property. If `UserID` serves as the partition key for the items in the container and there are 1,000 unique `UserID` values, 1,000 logical partitions are created for the container.
17
17
18
-
In addition to a partition key that determines the item’s logical partition, each item in a container has an *item ID* (unique within a logical partition). Combining the partition key and the item ID creates the item's *index*, which uniquely identifies the item.
18
+
In addition to a partition key that determines the item's logical partition, each item in a container has an *item ID* (unique within a logical partition). Combining the partition key and the *item ID* creates the item's *index*, which uniquely identifies the item.
19
19
20
-
[Choosing a partition key](partitioning-overview.md#choose-partitionkey) is an important decision that will affect your application’s performance.
20
+
[Choosing a partition key](partitioning-overview.md#choose-partitionkey) is an important decision that will affect your application's performance.
21
21
22
22
## Managing logical partitions
23
23
24
-
Azure Cosmos DB transparently and automatically manages the placement of logical partitions on physical partitions to efficiently satisfy the scalability and performance needs of the container. As the throughput and storage requirements of an application increase, Azure Cosmos DB moves logical partitions to automatically spread the load across a greater number of servers.
24
+
Azure Cosmos DB transparently and automatically manages the placement of logical partitions on physical partitions to efficiently satisfy the scalability and performance needs of the container. As the throughput and storage requirements of an application increase, Azure Cosmos DB moves logical partitions to automatically spread the load across a greater number of physical partitions. You can learn more about [physical partitions](partition-data.md#physical-partitions).
25
25
26
26
Azure Cosmos DB uses hash-based partitioning to spread logical partitions across physical partitions. Azure Cosmos DB hashes the partition key value of an item. The hashed result determines the physical partition. Then, Azure Cosmos DB allocates the key space of partition key hashes evenly across the physical partitions.
27
27
28
-
Queries that access data within a single logical partition are more cost-effective than queries that access multiple partitions. Transactions (in stored procedures or triggers) are allowed only against items in a single logical partition.
28
+
Transactions (in stored procedures or triggers) are allowed only against items in a single logical partition.
29
29
30
-
To learn more about how Azure Cosmos DB manages partitions, see [Logical partitions](partition-data.md). (It's not necessary to understand the internal details to build or run your applications, but added here for a curious reader.)
30
+
You can learn more about [how Azure Cosmos DB manages partitions](partition-data.md). (It's not necessary to understand the internal details to build or run your applications, but added here for a curious reader.)
31
31
32
32
## <aid="choose-partitionkey"></a>Choosing a partition key
33
33
34
-
The following is a good guidance for choosing a partition key:
34
+
Selecting your partition key is a simple but important design choice in Azure Cosmos DB. Once you select your partition key, it is not possible to change it in-place. If you need to change your partition key, you should move your data to a new container with your new desired partition key.
35
35
36
-
* A single logical partition has an upper limit of 20 GB of storage.
36
+
For **all** containers, your partition key should:
37
37
38
-
* Azure Cosmos containers have a minimum throughput of 400 request units per second (RU/s). When throughput is provisioned on a database, minimum RUs per container is 100 request units per second (RU/s). Requests to the same partition key can't exceed the throughput that's allocated to a partition. If requests exceed the allocated throughput, requests are rate-limited. So, it's important to pick a partition key that doesn't result in "hot spots" within your application.
38
+
* Be a property that has a value which does not change. If a property is your partition key, you can't update that property's value.
39
+
* Have a high cardinality. In other words, the property should have a wide range of possible values.
40
+
* Spread request unit (RU) consumption and data storage evenly across all logical partitions. This ensures even RU consumption and storage distribution across your physical partitions.
39
41
40
-
* Choose a partition key that has a wide range of values and access patterns that are evenly spread across logical partitions. This helps spread the data and the activity in your container across the set of logical partitions, so that resources for data storage and throughput can be distributed across the logical partitions.
42
+
If you need [multi-item ACID transactions](database-transactions-optimistic-concurrency.md#multi-item-transactions) in Azure Cosmos DB, you will need to use [stored procedures or triggers](how-to-write-stored-procedures-triggers-udfs.md#stored-procedures). All JavaScript-based stored procedures and triggers are scoped to a single logical partition.
41
43
42
-
* Choose a partition key that spreads the workload evenly across all partitions and evenly over time. Your choice of partition key should balance the need for efficient partition queries and transactions against the goal of distributing items across multiple partitions to achieve scalability.
44
+
## Partition keys for read-heavy containers
43
45
44
-
* Candidates for partition keys might include properties that appear frequently as a filter in your queries. Queries can be efficiently routed by including the partition key in the filter predicate.
46
+
For most containers, the above criteria is all you need to consider when picking a partition key. For large read-heavy containers, however, you might want to choose a partition key that appears frequently as a filter in your queries. Queries can be [efficiently routed to only the relevant physical partitions](how-to-query-container.md#in-partition-query) by including the partition key in the filter predicate.
47
+
48
+
If most of your workload's requests are queries and most of your queries have an equality filter on the same property, this property can be a good partition key choice. For example, if you frequently run a query that filters on `UserID`, then selecting `UserID` as the partition key would reduce the number of [cross-partition queries](how-to-query-container.md#avoiding-cross-partition-queries).
49
+
50
+
However, if your container is small, you probably don't have enough physical partitions to need to worry about the performance impact of cross-partition queries. Most small containers in Azure Cosmos DB only require one or two physical partitions.
51
+
52
+
If your container could grow to more than a few physical partitions, then you should make sure you pick a partition key that minimizes cross-partition queries. Your container will require more than a few physical partitions when either of the following are true:
53
+
54
+
* Your container will have over 30,000 RU's provisioned
55
+
* You container will store over 100 GB of data
56
+
57
+
## Using item ID as the partition key
58
+
59
+
If your container has a property that has a wide range of possible values, it is likely a great partition key choice. One possible example of such a property is the *item ID*. For small read-heavy containers or write-heavy containers of any size, the *item ID* is naturally a great choice for the partition key.
60
+
61
+
The system property *item ID* is guaranteed to exist in every item in your Cosmos container. You may have other properties that represent a logical ID of your item. In many cases, these are also great partition key choices for the same reasons as the *item ID*.
62
+
63
+
The *item ID* is a great partition key choice for the following reasons:
64
+
65
+
* There are a wide range of possible values (one unique *item ID* per item).
66
+
* Because there is a unique *item ID* per item, the *item ID* does a great job at evenly balancing RU consumption and data storage.
67
+
* You can easily do efficient point reads since you'll always know an item's partition key if you know its *item ID*.
68
+
69
+
Some things to consider when selecting the *item ID* as the partition key include:
70
+
71
+
* If the *item ID* is the partition key, it will become a unique identifier throughout your entire container. You won't be able to have items that have a duplicate *item ID*.
72
+
* If you have a read-heavy container that has a lot of [physical partitions](partition-data.md#physical-partitions), queries will be more efficient if they have an equality filter with the *item ID*.
73
+
* You can't run stored procedures or triggers across multiple logical partitions.
0 commit comments