You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Custom partitioning enables you to partition the analytical store data on fields that are commonly used as filters in analytical queries resulting in improved query performance.
15
+
Custom partitioning enables you to partition analytical store data, on fields that are commonly used as filters in analytical queries, resulting in improved query performance.
16
16
17
17
In this article, you will learn how to partition your data in Azure Cosmos DB analytical store using keys that are critical for your analytical workloads. It also explains how to take advantage of the improved query performance with partition pruning. You will also learn how the partitioned store helps to improve the query performance when your workloads have a significant number of updates or deletes.
18
18
19
19
> [!IMPORTANT]
20
20
> Custom partitioning feature is currently in public preview. This preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
21
21
22
22
> [!NOTE]
23
-
> Azure Cosmos DB accounts should have Azure Synapse Link enabled to take advantage of custom partitioning. Custom partitioning is currently supported for Azure Synapse Spark 2.0 only.
23
+
> Azure Cosmos DB accounts should have [Azure Synapse Link](synapse-link.md) enabled to take advantage of custom partitioning. Custom partitioning is currently supported for Azure Synapse Spark 2.0 only.
24
24
25
25
## How does it work?
26
26
27
-
With custom partitioning, you can choose a single field or a combination of fields from your dataset as the analytical store partition key.
27
+
Analytical store partitioning is independent of partitioning in the transactional store. By default, analytical store is not partitioned. If you want to query analytical store frequently based on fields such as Date, Time, Category etc. you leverage custom partitioning to create a separate partitioned store based on these keys. You can choose a single field or a combination of fields from your dataset as the analytical store partition key.
28
28
29
-
The analytical store partitioning is independent of partitioning in the transactional store. By default, analytical store is not partitioned. If you want to query analytical store frequently based on fields such as Date, Time, Category etc. we recommend that you create a partitioned store based on these keys.
30
-
31
-
To trigger partitioning, you can periodically execute partitioning job from an Azure Synapse Spark notebook using Azure Synapse Link. You can schedule it to run as a background job at your convenient schedule.
29
+
You can trigger partitioning from an Azure Synapse Spark notebook using Azure Synapse Link. You can schedule it to run as a background job, once or twice a day but can be executed more often, if needed.
32
30
33
31
> [!NOTE]
34
32
> The partitioned store points to the ADLS Gen2 primary storage account that is linked with the Azure Synapse workspace.
35
33
36
34
:::image type="content" source="./media/custom-partitioning-analytical-store/partitioned-store-architecture.png" alt-text="Architecture of partitioned store in Azure Synapse Link for Azure Cosmos DB" lightbox="./media/custom-partitioning-analytical-store/partitioned-store-architecture.png" border="false":::
37
35
38
-
The partitioned store contains Azure Cosmos DB analytical data until the last timestamp you ran your partitioning job. When you query your analytical data using the partition key filters in Synapse Spark, Synapse Link will automatically merge most recent data from the analytical store with the data in partitioned store. This way it gives you the latest results. Although it merges the data before querying, the delta isn’t written back to the partitioned store. As the delta between data in analytical store and partitioned store widens, the query times on partitioned data may vary. Triggering partitioning job more frequently will reduce this delta. Each time you execute the partition job, only incremental changes in the analytical store will be processed, instead of the full data set.
36
+
The partitioned store contains Azure Cosmos DB analytical data until the last timestamp you ran your partitioning job. When you query your analytical data using the partition key filters in Synapse Spark, Synapse Link will automatically merge the data in partitioned store with the most recent data from the analytical store. This way it gives you the latest results for your queries. Although it merges the data before querying, the delta isn’t written back to the partitioned store. As the delta between data in analytical store and partitioned store widens, the query times on partitioned data may vary. Triggering partitioning job more frequently will reduce this delta. Each time you execute the partition job, only incremental changes in the analytical store will be processed, instead of the full data set.
39
37
40
38
## When to use?
41
39
42
40
Using partitioned store is optional when querying analytical data in Azure Cosmos DB. You can directly query the same data using Synapse Link with the existing analytical store. You may want to turn on partitioned store if you have following requirements:
41
+
* Common analytical query filters that could be used as partition columns
42
+
* Low cardinality partition columns
43
+
* Partition column distributes data equally across partitions
44
+
* High volume of update or delete operations
45
+
* Slow data ingestion
43
46
44
-
* You want to frequently query analytical data filtered on some fields.
45
-
46
-
* You have high volume of updates/delete operations or data is ingested slowly. Partitioned store provides better query performance in these cases, irrespective of whether you are querying using partition keys or not.
47
-
48
-
Except for the workloads above, if you are querying live data using query filters that are different from the partition keys, we recommend that you query this directly from the analytical store, especially if the partitioning jobs are not run frequently.
47
+
Except for the workloads that meet above requirements, if you are querying live data using query filters that are different from the partition keys, we recommend that you query directly from the analytical store. This is especially true if the partitioning jobs are not scheduled to run frequently.
49
48
50
49
## Benefits
51
50
@@ -55,9 +54,7 @@ Because the data corresponding to each unique partition key is colocated in the
55
54
56
55
### Flexibility to partition your analytical data
57
56
58
-
You can have multiple partitioning strategies for a given analytical store container where the analytical store data can be partitioned using separate partition keys. For example, the "store_sales" container can be partitioned using "sold_date" as key and can also be partitioned using "item" as key. You must have two separate partitioning jobs in this case, which will essentially partition the data into two separate partitioned stores. This partitioning strategy is beneficial if some of the queries use "sold_date" as the query filter and some other queries use "item" as the query filter.
59
-
60
-
The data across different partition keys will be part of the same partitioned store and you can query based on the partition key to pick the corresponding data.
57
+
You can have multiple partitioning strategies for a given analytical store container. You could use composite or separate partition keys based on your query requirements. Please see partition strategies for guidance on this.
61
58
62
59
### Query performance improvements
63
60
@@ -77,13 +74,57 @@ If you configured [managed private endpoints](analytical-store-private-endpoints
77
74
78
75
Similarly, if you configured [customer-managed keys on analytical store](how-to-setup-cmk.md#is-it-possible-to-use-customer-managed-keys-in-conjunction-with-the-azure-cosmos-db-analytical-store), you must directly enable it on the Synapse workspace primary storage account, which is the partitioned store, as well.
79
76
77
+
## Partitioning strategies
78
+
You could use one or more partition keys for your analytical data. If you are using multiple partition keys, below are some recommendations on how to partition the data:
79
+
-**Using composite keys:**
80
+
81
+
Say, you want to frequently query based on Key1 and Key2.
82
+
83
+
For example, "Query for all records where ReadDate = ‘2021-10-08’ and Location = ‘Sydney’".
84
+
85
+
In this case, using composite keys will be more efficient, to look up all records that match the ReadDate and the records that match Location within that ReadDate.
Please note that it's not efficient to now frequently query based on "ReadDate" and "Location" filters together, on above partitioning. Composite keys will give
119
+
better query performance in that case.
120
+
80
121
## Limitations
81
122
82
123
* Custom partitioning is only available for Azure Synapse Spark. Custom partitioning is currently not supported for serverless SQL pools.
83
124
84
-
* Currently partitioned store can only point to the primary storage account associated with the Synapse workspace. We do not support selecting custom storage accounts at this point.
125
+
* Currently partitioned store can only point to the primary storage account associated with the Synapse workspace. Selecting custom storage accountsisnot supported at this point.
85
126
86
-
*Although the API for MongoDB supports analytical store and Synapse Link, it currently doesn't support custom partitioning.
127
+
*Custom partitioning is only available forSQLAPIin Cosmos DB. APIfor Mongo DB, Gremlin and Cassandra are not supported at this time.
87
128
88
129
## Pricing
89
130
@@ -116,13 +157,12 @@ Yes, the partition key for the given container can be changed and the new partit
116
157
117
158
### Can different partition keys point to the same BasePath?
118
159
119
-
Yes, since the partition key definition is part of the partitioned store path, different partition keys will have different paths branching from the same BasePath.
120
-
121
-
Base path format could be specified as: /mnt/partitionedstorename/\<Cosmos_DB_account_name\>/\<Cosmos_DB_database_rid\>/\<Cosmos_DB_container_rid\>/partition=partitionkey/
160
+
Yes, you can specify multiple partition keys on the same partitioned store as below:
0 commit comments