You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/stream-analytics/stream-analytics-documentdb-output.md
+23-36Lines changed: 23 additions & 36 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,42 +1,29 @@
1
1
---
2
2
title: Azure Stream Analytics output to Azure Cosmos DB
3
-
description: This article describes how to use Azure Stream Analytics to save output to Azure Cosmos DB for JSON output, for data archiving and low-latency queries on unstructured JSON data.
4
-
author: enkrumah
5
-
ms.author: ebnkruma
3
+
description: This article describes how to use Azure Stream Analytics to save output to Azure Cosmos DB for JSON output.
4
+
author: ajetasin
5
+
ms.author: ajetasi
6
6
ms.service: stream-analytics
7
7
ms.topic: conceptual
8
-
ms.date: 09/15/2022
8
+
ms.date: 03/29/2024
9
9
---
10
10
# Azure Stream Analytics output to Azure Cosmos DB
11
-
Azure Stream Analytics can target [Azure Cosmos DB](https://azure.microsoft.com/services/documentdb/) for JSON output, enabling data archiving and low-latency queries on unstructured JSON data. This document covers some best practices for implementing this configuration. We recommend that you set your job to compatability level 1.2 when using Azure Cosmos DB as output.
12
-
13
-
If you're unfamiliar with Azure Cosmos DB, see the [Azure Cosmos DB documentation](../cosmos-db/index.yml) to get started.
11
+
Azure Stream Analytics can output data in JSON format to [Azure Cosmos DB](https://azure.microsoft.com/services/documentdb/). It enables data archiving and low-latency queries on unstructured JSON data. This article covers some best practices for implementing this configuration (Stream Analytics to Cosmos DB). If you're unfamiliar with Azure Cosmos DB, see the [Azure Cosmos DB documentation](../cosmos-db/index.yml) to get started.
14
12
15
13
> [!Note]
16
-
> At this time, Stream Analytics supports connection to Azure Cosmos DB only through the *SQL API*.
17
-
> Other Azure Cosmos DB APIs are not yet supported. If you point Stream Analytics to Azure Cosmos DB accounts created with other APIs, the data might not be properly stored.
14
+
> -At this time, Stream Analytics supports connection to Azure Cosmos DB only through the *SQL API*.Other Azure Cosmos DB APIs are not yet supported. If you point Stream Analytics to Azure Cosmos DB accounts created with other APIs, the data might not be properly stored.
15
+
> - We recommend that you set your job to compatability level 1.2 when using Azure Cosmos DB as output.
18
16
19
17
## Basics of Azure Cosmos DB as an output target
20
-
The Azure Cosmos DB output in Stream Analytics enables writing your stream processing results as JSON output into your Azure Cosmos DB containers.
21
-
22
-
Stream Analytics doesn't create containers in your database. Instead, it requires you to create them beforehand. You can then control the billing costs of Azure Cosmos DB containers. You can also tune the performance, consistency, and capacity of your containers directly by using the [Azure Cosmos DB APIs](/rest/api/cosmos-db/).
23
-
24
-
25
-
The following sections detail some of the container options for Azure Cosmos DB.
18
+
The Azure Cosmos DB output in Stream Analytics enables writing your stream processing results as JSON output into your Azure Cosmos DB containers. Stream Analytics doesn't create containers in your database. Instead, it requires you to create them beforehand. You can then control the billing costs of Azure Cosmos DB containers. You can also tune the performance, consistency, and capacity of your containers directly by using the [Azure Cosmos DB APIs](/rest/api/cosmos-db/). The following sections detail some of the container options for Azure Cosmos DB.
26
19
27
20
## Tuning consistency, availability, and latency
28
21
To match your application requirements, Azure Cosmos DB allows you to fine-tune the database and containers and make trade-offs between consistency, availability, latency, and throughput.
29
22
30
-
Depending on what levels of read consistency your scenario needs against read and write latency, you can choose a consistency level on your database account. You can improve throughput by scaling up Request Units (RUs) on the container.
31
-
32
-
Also by default, Azure Cosmos DB enables synchronous indexing on each CRUD operation to your container. This is another useful option to control write/read performance in Azure Cosmos DB.
33
-
34
-
For more information, review the [Change your database and query consistency levels](../cosmos-db/consistency-levels.md) article.
23
+
Depending on what levels of read consistency your scenario needs against read and write latency, you can choose a consistency level on your database account. You can improve throughput by scaling up Request Units (RUs) on the container. Also by default, Azure Cosmos DB enables synchronous indexing on each CRUD operation to your container. This option is another useful one to control write/read performance in Azure Cosmos DB. For more information, review the [Change your database and query consistency levels](../cosmos-db/consistency-levels.md) article.
35
24
36
25
## Upserts from Stream Analytics
37
-
Stream Analytics integration with Azure Cosmos DB allows you to insert or update records in your container based on a given **Document ID** column. This is also called an *upsert*.
38
-
39
-
Stream Analytics uses an optimistic upsert approach. Updates happen only when an insert fails with a document ID conflict.
26
+
Stream Analytics integration with Azure Cosmos DB allows you to insert or update records in your container based on a given **Document ID** column. This operation is also called an *upsert*. Stream Analytics uses an optimistic upsert approach. Updates happen only when an insert fails with a document ID conflict.
40
27
41
28
With compatibility level 1.0, Stream Analytics performs this update as a PATCH operation, so it enables partial updates to the document. Stream Analytics adds new properties or replaces an existing property incrementally. However, changes in the values of array properties in your JSON document result in overwriting the entire array. That is, the array isn't merged.
42
29
@@ -51,7 +38,7 @@ If the incoming JSON document has an existing ID field, that field is automatica
51
38
If you want to save *all* documents, including the ones that have a duplicate ID, rename the ID field in your query (by using the **AS** keyword). Let Azure Cosmos DB create the ID field or replace the ID with another column's value (by using the **AS** keyword or by using the **Document ID** setting).
52
39
53
40
## Data partitioning in Azure Cosmos DB
54
-
Azure Cosmos DB automatically scales partitions based on your workload. So we recommend [unlimited](../cosmos-db/partitioning-overview.md) containers as the approach for partitioning your data. When Stream Analytics writes to unlimited containers, it uses as many parallel writers as the previous query step or input partitioning scheme.
41
+
Azure Cosmos DB automatically scales partitions based on your workload. So we recommend that you use [unlimited](../cosmos-db/partitioning-overview.md) containers for partitioning your data. When Stream Analytics writes to unlimited containers, it uses as many parallel writers as the previous query step or input partitioning scheme.
55
42
56
43
> [!NOTE]
57
44
> Azure Stream Analytics supports only unlimited containers with partition keys at the top level. For example, `/region` is supported. Nested partition keys (for example, `/region/name`) are not supported.
@@ -60,11 +47,11 @@ Depending on your choice of partition key, you might receive this _warning_:
60
47
61
48
`CosmosDB Output contains multiple rows and just one row per partition key. If the output latency is higher than expected, consider choosing a partition key that contains at least several hundred records per partition key.`
62
49
63
-
It's important to choose a partition key property that has a number of distinct values, and that lets you distribute your workload evenly across these values. As a natural artifact of partitioning, requests that involve the same partition key are limited by the maximum throughput of a single partition.
50
+
It's important to choose a partition key property that has many distinct values, and that lets you distribute your workload evenly across these values. As a natural artifact of partitioning, requests that involve the same partition key are limited by the maximum throughput of a single partition.
64
51
65
-
The storage size for documents that belong to the same partition key value is limited to 20 GB (the [physical partition size limit](../cosmos-db/partitioning-overview.md) is 50 GB). An [ideal partition key](../cosmos-db/partitioning-overview.md#choose-partitionkey) is one that appears frequently as a filter in your queries and has sufficient cardinality to ensure that your solution is scalable.
52
+
The storage size for documents that belong to the same partition key value is limited to 20 GB (the [physical partition size limit](../cosmos-db/partitioning-overview.md) is 50 GB). An [ideal partition key](../cosmos-db/partitioning-overview.md#choose-partitionkey) is the one that appears frequently as a filter in your queries and has sufficient cardinality to ensure that your solution is scalable.
66
53
67
-
Partition keys used for Stream Analytics queries and Azure Cosmos DB don't need to be identical. Fully parallel topologies recommend using *Input Partition key*, `PartitionId`, as the Stream Analytics query's partition key but that may not be the recommended choice for an Azure Cosmos DB container's partition key.
54
+
Partition keys used for Stream Analytics queries and Azure Cosmos DB don't need to be identical. Fully parallel topologies recommend using *Input Partition key*, `PartitionId`, as the Stream Analytics query's partition key but that might not be the recommended choice for an Azure Cosmos DB container's partition key.
68
55
69
56
A partition key is also the boundary for transactions in stored procedures and triggers for Azure Cosmos DB. You should choose the partition key so that documents that occur together in transactions share the same partition key value. The article [Partitioning in Azure Cosmos DB](../cosmos-db/partitioning-overview.md) gives more details on choosing a partition key.
70
57
@@ -77,25 +64,25 @@ With compatibility level 1.2, Stream Analytics supports native integration to bu
77
64
78
65
The improved writing mechanism is available under a new compatibility level because of a difference in upsert behavior. With levels before 1.2, the upsert behavior is to insert or merge the document. With 1.2, upsert behavior is modified to insert or replace the document.
79
66
80
-
With levels before 1.2, Stream Analytics uses a custom stored procedure to bulk upsert documents per partition key into Azure Cosmos DB. There, a batch is written as a transaction. Even when a single record hits a transient error (throttling), the whole batch has to be retried. This makes scenarios with even reasonable throttling relatively slow.
67
+
With levels before 1.2, Stream Analytics uses a custom stored procedure to bulk upsert documents per partition key into Azure Cosmos DB. There, a batch is written as a transaction. Even when a single record hits a transient error (throttling), the whole batch has to be retried. This behavior makes scenarios with even reasonable throttling relatively slow.
81
68
82
69
The following example shows two identical Stream Analytics jobs reading from the same Azure Event Hubs input. Both Stream Analytics jobs are [fully partitioned](./stream-analytics-parallelization.md#embarrassingly-parallel-jobs) with a passthrough query and write to identical Azure Cosmos DB containers. Metrics on the left are from the job configured with compatibility level 1.0. Metrics on the right are configured with 1.2. An Azure Cosmos DB container's partition key is a unique GUID that comes from the input event.
83
70
84
-

71
+
:::image type="content" source="media/stream-analytics-documentdb-output/stream-analytics-documentdb-output-3.png" alt-text="Screenshot that shows the comparison of Stream Analytics metrics.":::
85
72
86
-
The incoming event rate in Event Hubs is two times higher than Azure Cosmos DB containers (20,000 RUs) are configured to take in, so throttling is expected in Azure Cosmos DB. However, the job with 1.2 is consistently writing at a higher throughput (output events per minute) and with a lower average SU% utilization. In your environment, this difference will depend on few more factors. These factors include choice of event format, input event/message size, partition keys, and query.
73
+
The incoming event rate in Event Hubs is two times higher than Azure Cosmos DB containers (20,000 RUs) are configured to take in, so throttling is expected in Azure Cosmos DB. However, the job with 1.2 is consistently writing at a higher throughput (output events per minute) and with a lower average SU% utilization. In your environment, this difference depends on few more factors. These factors include choice of event format, input event/message size, partition keys, and query.
87
74
88
-

75
+
:::image type="content" source="media/stream-analytics-documentdb-output/stream-analytics-documentdb-output-2.png" alt-text="Screenshot that shows the comparison of Azure Cosmos DB metrics.":::
89
76
90
-
With 1.2, Stream Analytics is more intelligent in utilizing 100 percent of the available throughput in Azure Cosmos DB with very few resubmissions from throttling or rate limiting. This provides a better experience for other workloads like queries running on the container at the same time. If you want to see how Stream Analytics scales out with Azure Cosmos DB as a sink for 1,000 to 10,000 messages per second, try [this Azure sample project](https://github.com/Azure-Samples/streaming-at-scale/tree/main/eventhubs-streamanalytics-cosmosdb).
77
+
With 1.2, Stream Analytics is more intelligent in utilizing 100 percent of the available throughput in Azure Cosmos DB with few resubmissions from throttling or rate limiting. This behavior provides a better experience for other workloads like queries running on the container at the same time. If you want to see how Stream Analytics scales out with Azure Cosmos DB as a sink for 1,000 to 10,000 messages per second, try [this Azure sample project](https://github.com/Azure-Samples/streaming-at-scale/tree/main/eventhubs-streamanalytics-cosmosdb).
91
78
92
79
Throughput of Azure Cosmos DB output is identical with 1.0 and 1.1. We *strongly recommend* that you use compatibility level 1.2 in Stream Analytics with Azure Cosmos DB.
93
80
94
81
## Azure Cosmos DB settings for JSON output
95
82
96
83
Using Azure Cosmos DB as an output in Stream Analytics generates the following prompt for information.
97
84
98
-

85
+
:::image type="content" source="media/stream-analytics-documentdb-output/stream-analytics-documentdb-output-1.png" alt-text="Screenshot that shows the information fields for an Azure Cosmos DB output stream.":::
99
86
100
87
|Field | Description|
101
88
|------------- | -------------|
@@ -105,7 +92,7 @@ Using Azure Cosmos DB as an output in Stream Analytics generates the following p
105
92
|Account key | The shared access key for the Azure Cosmos DB account.|
106
93
|Database | The Azure Cosmos DB database name.|
107
94
|Container name | The container name, such as `MyContainer`. One container named `MyContainer` must exist. |
108
-
|Document ID | Optional. The column name in output events used as the unique key on which insert or update operations must be based. If you leave it empty, all events will be inserted, with no update option.|
95
+
|Document ID | Optional. The column name in output events used as the unique key on which insert or update operations must be based. If you leave it empty, all events are inserted, with no update option.|
109
96
110
97
After you configure the Azure Cosmos DB output, you can use it in the query as the target of an [INTO statement](/stream-analytics-query/into-azure-stream-analytics). When you're using an Azure Cosmos DB output that way, [a partition key needs to be set explicitly](./stream-analytics-parallelization.md#partitions-in-inputs-and-outputs).
111
98
@@ -132,9 +119,9 @@ If a transient failure, service unavailability, or throttling happens while Stre
132
119
133
120
1. A unique index constraint is added to the collection and the output data from Stream Analytics violates this constraint. Ensure the output data from Stream Analytics doesn't violate unique constraints or remove constraints. For more information, see [Unique key constraints in Azure Cosmos DB](../cosmos-db/unique-keys.md).
0 commit comments