Skip to content

Commit bc65b8b

Browse files
authored
Merge pull request #186760 from deborahc/dech-cosmos-partitioning-docs
How to create an alert for 20 GB logical partition limits
2 parents 666305b + cd00d06 commit bc65b8b

File tree

7 files changed

+184
-10
lines changed

7 files changed

+184
-10
lines changed

articles/cosmos-db/TOC.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1587,7 +1587,11 @@
15871587
- name: Audit control plane logs
15881588
href: audit-control-plane-logs.md
15891589
- name: Configure alerts
1590-
href: create-alerts.md
1590+
items:
1591+
- name: Create alert on Metrics
1592+
href: create-alerts.md
1593+
- name: Create alert on logical partition key size
1594+
href: how-to-alert-on-logical-partition-key-storage-size.md
15911595
- name: Monitoring data reference
15921596
href: monitor-cosmos-db-reference.md
15931597
- name: Application logging with Logic Apps

articles/cosmos-db/concepts-limits.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ You can provision throughput at a container-level or a database-level in terms o
3535
| Minimum RU/s required per 1 GB | 10 RU/s<br>**Note:** this minimum can be lowered if your account is eligible to our ["high storage / low throughput" program](set-throughput.md#high-storage-low-throughput-program) |
3636

3737
> [!NOTE]
38-
> To learn about best practices for managing workloads that have partition keys requiring higher limits for storage or throughput, see [Create a synthetic partition key](synthetic-partition-keys.md). If your workload has already reached the logical partition limit of 20GB in production, it is recommended to re-architect your application with a different partition key as a long-term solution. To help give time for this, you can request a temporary increase in the logical partition key limit for your existing application. [File an Azure support ticket](create-support-request-quota-increase.md) and select quota type **Temporary increase in container's logical partition key size**. Note this is intended as a temporary mitigation and not recommendeded as a long-term solution, as SLA guarantees are not honored when the limit is increased. To remove the configuration, file a support ticket and select quota type **Restore container’s logical partition key size to default (20 GB)**. This can be done after you have either deleted data to fit the 20 GB logical partition limit or have re-architected your application with a different partition key.
38+
> To learn about best practices for managing workloads that have partition keys requiring higher limits for storage or throughput, see [Create a synthetic partition key](synthetic-partition-keys.md). If your workload has already reached the logical partition limit of 20GB in production, it is recommended to re-architect your application with a different partition key as a long-term solution. To help give time for this, you can request a temporary increase in the logical partition key limit for your existing application. [File an Azure support ticket](create-support-request-quota-increase.md) and select quota type **Temporary increase in container's logical partition key size**. Note this is intended as a temporary mitigation and not recommended as a long-term solution, as SLA guarantees are not honored when the limit is increased. To remove the configuration, file a support ticket and select quota type **Restore container’s logical partition key size to default (20 GB)**. This can be done after you have either deleted data to fit the 20 GB logical partition limit or have re-architected your application with a different partition key.
3939
4040
### Minimum throughput limits
4141

articles/cosmos-db/create-alerts.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,26 +5,31 @@ author: StefArroyo
55
ms.author: esarroyo
66
ms.service: cosmos-db
77
ms.topic: how-to
8-
ms.date: 07/16/2020
8+
ms.date: 02/08/2022
99
---
1010

1111
# Create alerts for Azure Cosmos DB using Azure Monitor
1212
[!INCLUDE[appliesto-all-apis](includes/appliesto-all-apis.md)]
1313

1414
Alerts are used to set up recurring tests to monitor the availability and responsiveness of your Azure Cosmos DB resources. Alerts can send you a notification in the form of an email, or execute an Azure Function when one of your metrics reaches the threshold or if a specific event is logged in the activity log.
1515

16-
You can receive an alert based on the metrics, or the activity log events on your Azure Cosmos account:
16+
You can receive an alert based on the metrics, activity log events, or Log Analytics logs on your Azure Cosmos account:
1717

1818
* **Metrics** - The alert triggers when the value of a specified metric crosses a threshold you assign. For example, when the total request units consumed exceed 1000 RU/s. This alert is triggered both when the condition is first met and then afterwards when that condition is no longer being met. See the [monitoring data reference](monitor-cosmos-db-reference.md#metrics) article for different metrics available in Azure Cosmos DB.
1919

2020
* **Activity log events** – This alert triggers when a certain event occurs. For example, when the keys of your Azure Cosmos account are accessed or refreshed.
2121

22+
* **Log Analytics** – This alert triggers when the value of a specified property in the results of a Log Analytics query crosses a threshold you assign. For example, you can write a Log Analytics query to [monitor if the storage for a logical partition key is reaching the 20 GB logical partition key storage limit](how-to-alert-on-logical-partition-key-storage-size.md) in Azure Cosmos DB.
23+
2224
You can set up alerts from the Azure Cosmos DB pane or the Azure Monitor service in the Azure portal. Both the interfaces offer the same options. This article shows you how to set up alerts for Azure Cosmos DB using Azure Monitor.
2325

2426
## Create an alert rule
2527

2628
This section shows how to create an alert when you receive an HTTP status code 429, which is received when the requests are rate limited. For examples, you may want to receive an alert when there are 100 or more rate limited requests. This article shows you how to configure an alert for such scenario by using the HTTP status code. You can use the similar steps to configure other types of alerts as well, you just need to choose a different condition based on your requirement.
2729

30+
> [!TIP]
31+
> The scenario of alerting based on number of 429s exceeding a threshold is used here for illustration purposes. It does not mean that there is anything inherently wrong with seeing 429s on your database or container. In general, if you see 1-5% of requests with 429s in a production workload and your overall application latency is within your requirements, this is a normal and healthy sign that you are fully using the throughput (RU/s) you've provisioned. [Learn more about how to interpret and debug 429 exceptions](sql/troubleshoot-request-rate-too-large.md).
32+
2833
1. Sign into the [Azure portal.](https://portal.azure.com/)
2934

3035
1. Select **Monitor** from the left-hand navigation bar and select **Alerts**.
@@ -43,17 +48,17 @@ This section shows how to create an alert when you receive an HTTP status code 4
4348

4449
* After filling in the details, a list of Azure Cosmos accounts in the selected scope is displayed. Choose the one for which you want to configure alerts and select **Done**.
4550

46-
1. Fill out the **Condition** section:
51+
1. Fill out the **Condition** section:
4752

48-
* Open the **Select condition** pane to open the **Configure signal logic** page and configure the following:
53+
* Open the **Select condition** pane to open the **Select a signal** page and configure the following:
4954

50-
* Select a signal. The **signal type** can be a **Metric** or an **Activity Log**. Choose **Metrics** for this scenario. Because you want to get an alert when there are rate limiting issues on the total request units metric.
55+
* Select a signal. The **signal type** can be a **Metric**, an **Activity Log** or a **Log** (Log Analytics). Choose **Metrics** for this scenario, as you want to get an alert when rate limiting occurs on the total request units metric.
5156

5257
* Select **All** for the **Monitor service**
5358

5459
* Choose a **Signal name**. To get an alert for HTTP status codes, choose the **Total Request Units** signal.
5560

56-
* In the next tab, you can define the logic for triggering an alert and use the chart to view trends of your Azure Cosmos account. The **Total Request Units** metric supports dimensions. These dimensions allow you to filter on the metric. If you dont select any dimension, this value is ignored.
61+
* Now, you can define the logic for triggering an alert and use the chart to view trends of your Azure Cosmos account. The **Total Request Units** metric supports dimensions. These dimensions allow you to filter on the metric. For example, you can use dimensions to filter to a specific database or container you want to monitor. If you don't select any dimension, this value is ignored.
5762

5863
* Choose **StatusCode** as the **Dimension name**. Select **Add custom value** and set the status code to 429.
5964

@@ -91,6 +96,7 @@ The following are some scenarios where you can use alerts:
9196

9297
* When the keys of an Azure Cosmos account are updated.
9398
* When the data or index usage of a container, database, or a region exceeds a certain number of bytes.
99+
* [When the storage for a logical partition key is reaching the Azure Cosmos DB 20 GB logical partition storage limit.](how-to-alert-on-logical-partition-key-storage-size.md)
94100
* When the normalized RU/s consumption is greater than certain percentage. The normalized RU consumption metric gives the maximum throughput utilization within a replica set. To learn, see the [How to monitor normalized RU/s](monitor-normalized-request-units.md) article.
95101
* When a region is added, removed, or if it goes offline.
96102
* When a database or a container is created, deleted, or updated.
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
---
2+
title: Create alerts to monitor if storage for a logical partition key is approaching 20 GB
3+
description: Learn how to set up alerts for Azure Cosmos DB using Log Analytics
4+
author: deborahc
5+
ms.author: dech
6+
ms.service: cosmos-db
7+
ms.topic: how-to
8+
ms.date: 02/08/2022
9+
---
10+
11+
# Create alerts to monitor if storage for a logical partition key is approaching 20 GB
12+
[!INCLUDE[appliesto-all-apis](includes/appliesto-all-apis.md)]
13+
14+
Azure Cosmos DB enforces a maximum logical partition key size of 20 GB. For example, if you have a container/collection partitioned by **UserId**, the data within the "Alice" logical partition can store up to 20 GB of data.
15+
16+
You can use alerts to monitor if you have any logical partition keys that are approaching the 20 GB logical partition limit. Alerts can send you a notification in the form of an email or execute an action, such as an Azure Function or Logic App, when the condition is triggered.
17+
18+
In this article, we’ll create an alert that will trigger if the storage for a logical partition key exceeds 70% of the 20 GB limit (has more than 14 GB of storage). You can set up alerts from the **Alerts** pane in a specific Azure Cosmos DB account or the **Azure Monitor** service in the Azure portal. Both the interfaces offer the same options. This article shows you how to set up the alert from Azure Monitor.
19+
20+
## Pre-requisites
21+
22+
We’ll be using data from the **PartitionKeyStatistics** log category in Diagnostic Logs to create our alert. Diagnostic Logs is an opt-in feature, so you’ll need to enable it before proceeding. In our example, we’ll use the recommended Resource Specific Logs option.
23+
24+
Follow the instructions in [Monitor Azure Cosmos DB data by using diagnostic settings in Azure](cosmosdb-monitor-resource-logs.md) to ensure:
25+
- Diagnostic Logs is enabled on the Azure Cosmos DB account(s) you want to monitor
26+
- You have configured collection of the **PartitionKeyStatistics** log category
27+
- The Diagnostic logs are being sent to a Log Analytics workspace
28+
29+
30+
## Create the alert
31+
32+
1. Sign into the [Azure portal.](https://portal.azure.com/)
33+
34+
1. Select **Monitor** from the left-hand navigation bar and select **Alerts**.
35+
36+
1. Select the New alert rule button to open the Create alert rule pane.
37+
38+
1. Fill out the **Scope** section:
39+
40+
* Open the **Select resource** pane and configure the following:
41+
42+
* Choose your **subscription** name.
43+
44+
* Select **Azure Cosmos DB accounts** for the **resource type**.
45+
46+
* The **location** of your Azure Cosmos account.
47+
48+
* After filling in the details, a list of Azure Cosmos accounts in the selected scope is displayed. Choose the one for which you want to configure alerts and select **Done**.
49+
50+
1. Fill out the **Condition** section:
51+
52+
* Open the **Select condition** pane to open the **Select a signal** page and configure the following:
53+
54+
* Select **Log** for the **Signal type**.
55+
56+
* Select **Log analytics** for the **Monitor service**.
57+
58+
* Select **Custom log search** for the **Signal name**.
59+
60+
* In the query editor, add the below query. You can run the query to preview the result.
61+
> [!NOTE]
62+
> It's perfectly ok if the query currently returns no results. The **PartitionKeyStatistics** logs only show data if there are logical partition keys with significant storage size, so if there are no results returned, it means that there are no such keys. If and when such keys do appear in the future, the alert will be triggered then.
63+
64+
```kusto
65+
CDBPartitionKeyStatistics
66+
// Get the latest storage size for each logical partition key value
67+
| summarize arg_max(TimeGenerated, *) by AccountName, DatabaseName, CollectionName, _ResourceId, PartitionKey
68+
| extend utilizationOf20GBLogicalPartition = SizeKb / (20.0 * 1024.0 * 1024.0) // Current storage / 20GB
69+
| project TimeGenerated, AccountName, DatabaseName, CollectionName, _ResourceId, PartitionKey, SizeKb, utilizationOf20GBLogicalPartition
70+
```
71+
* Select **Continue Editing Alert**.
72+
73+
* In the Measurement section:
74+
75+
* Select **utilizationOf20GBLogicalPartition** for **Measure**.
76+
77+
* Select **Maximum** for **Aggregation type**.
78+
79+
* Select your desired **Aggregation granularity** based on your requirements. In our example, we’ll select **1 hour**. This means that the alert will calculate the storage size of the logical partition using the highest storage value in the hour.
80+
81+
* In the Split by dimensions section:
82+
83+
* Add the following six dimensions: **AccountName**, **DatabaseName**, **CollectionName**, **_ResourceId**, **PartitionKey**, **SizeKb**. This ensures that when the alert is triggered, you’ll be able to identify the specific Azure Cosmos DB account, database, collection, and partition key that triggered the alert.
84+
85+
* For the **SizeKb** dimension, select **Select all current and future values** as the **Dimension values**.
86+
87+
* For all other dimensions:
88+
* If you want to monitor only a specific Azure Cosmos DB account, database, collection, or partition key, select the specific value or **Add custom value** if the value doesn’t currently appear in the dropdown.
89+
90+
* Otherwise, select **Select all current and future values**. For example, if your Cosmos account currently has two databases and five collections, selecting all current and feature values for the Database and CollectionName dimension will ensure that the alert will apply to all existing databases and collections, as well as any you may create in the future.
91+
92+
* In the Alert logic section:
93+
94+
* Select **Greater than** for **Operator**.
95+
96+
* Select your desired threshold value. Based on how we’ve written the query, a valid threshold will be a number between 0 and 1 (inclusive). In our example, we want to trigger the alert if a logical partition key reaches 70% of the allowed storage, so we enter **0.7**. You can tune this number based on your requirements.
97+
98+
* Select your desired **Frequency of evaluation** based on your requirements. In our example, we’ll select **1 hour**. Note this value must be less than or equal to the alert evaluation period.
99+
100+
After completing Step 5, the **Condition** section will look like the example below.
101+
102+
:::image type="content" source="media/how-to-alert-on-logical-partition-key-storage-size/alert-signal-logic.png" alt-text="Screenshot of an example configuration for signal logic":::
103+
104+
1. Fill out the **Actions** section:
105+
106+
* Select an existing action group or create a new action group. An action group enables you to define the action(s) to be taken when the alert is triggered. For this example, create a new action group to receive an email notification when the alert is triggered. Open the **Create action group** pane.
107+
108+
* In the **Basics** section:
109+
110+
* Choose the subscription and the resource group in which this action group will be created.
111+
112+
* **Action group name** - The action group name must be unique within a resource group.
113+
114+
* **Display name** - This value is included in email and SMS notifications to identify which action group was the source of the notification.
115+
116+
* In the **Notifications** section:
117+
118+
* Provide a name for your notification.
119+
120+
* Select **Email/SMS message/Push/Voice** as the **Notification Type** and enter your email, SMS, Push Notification, or Voice information.
121+
* Optional: In the **Actions** section, you can select an **Action** that will be run, such as an Azure Function or Logic App in the **Actions** section.
122+
* Select **Review + create** to create the Action Group.
123+
124+
1. Fill out the **Details** section:
125+
126+
* Define a name for the alert, provide an optional description, the severity level of the alert, and choose whether to enable the rule upon rule creation.
127+
* Select **Review + create** and select **Create** to finish creating the alert.
128+
129+
After creating the alert, it will be active within 10 minutes.
130+
131+
## Example alert
132+
To see your alerts in the Azure portal:
133+
134+
1. Sign into the [Azure portal.](https://portal.azure.com/)
135+
136+
1. Select **Monitor** from the left-hand navigation bar and select **Alerts**.
137+
138+
When the alert is fired, it will include:
139+
- Database account name
140+
- Database name
141+
- Collection name
142+
- Logical partition key
143+
- Storage in KB of the logical partition key
144+
- Utilization of the 20 GB limit
145+
146+
For example, in the alert that was fired below, we see the logical partition of "ContosoTenant" has reached 0.78 of the 20GB logical partition storage limit, with 16GB of data in a particular database and collection.
147+
148+
:::image type="content" source="media/how-to-alert-on-logical-partition-key-storage-size/alert-when-logical-partition-key-exceeds-threshold.png" alt-text="Screenshot of an alert fired when logical partition key size exceeds threshold":::
149+
150+
## Remediation steps
151+
When the 20 GB logical partition size limit is reached, you won't be able to write any more data to that logical partition. As a result, it's recommended to rearchitect your application with a different partition key as a long-term solution.
152+
153+
To help give time for this, you can request a temporary increase in the logical partition key limit for your existing application. [File an Azure support ticket](create-support-request-quota-increase.md) and select quota type **Temporary increase in container's logical partition key size.** Note this is intended as a temporary mitigation and not recommended as a long-term solution, as SLA guarantees are not honored when the limit is increased. To remove the configuration, file a support ticket and select quota type **Restore container’s logical partition key size to default (20 GB)**. This can be done after you have either deleted data to fit the 20 GB logical partition limit or have rearchitected your application with a different partition key.
154+
155+
To learn about best practices for managing workloads that have partition keys requiring higher limits for storage or throughput, see [Create a synthetic partition key](synthetic-partition-keys.md).
156+
157+
## Next steps
158+
* How to [create alerts for Azure Cosmos DB using Azure Monitor](create-alerts.md).
159+
* How to [monitor normalized RU/s metric](monitor-normalized-request-units.md) in Azure Cosmos container.
160+
* How to [monitor throughput or request unit usage](monitor-request-unit-usage.md) of an operation in Azure Cosmos DB.
161+
* How to [interpret and debut 429 exceptions](sql/troubleshoot-request-rate-too-large.md) in Azure Cosmos container.
162+
105 KB
Loading
Loading

0 commit comments

Comments
 (0)