You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/stream-analytics/stream-analytics-parallelization.md
+17-17Lines changed: 17 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -210,27 +210,27 @@ Partitioning a step requires the following conditions:
210
210
When a query is partitioned, the input events are processed and aggregated in separate partition groups, and outputs events are generated for each of the groups. If you want a combined aggregate, you must create a second non-partitioned step to aggregate.
211
211
212
212
### Calculate the max streaming units for a job
213
-
All non-partitioned steps together can scale up to six streaming units (SUs) for a Stream Analytics job. In addition, you can add 6 SUs for each partition in a partitioned step.
213
+
All non-partitioned steps together can scale up to one streaming unit (SU V2s) for a Stream Analytics job. In addition, you can add 1 SU V2 for each partition in a partitioned step.
| <ul><li>The query contains one step.</li><li>The step isn't partitioned.</li></ul> |6|
219
-
| <ul><li>The input data stream is partitioned by 16.</li><li>The query contains one step.</li><li>The step is partitioned.</li></ul> |96 (6 * 16 partitions) |
220
-
| <ul><li>The query contains two steps.</li><li>Neither of the steps is partitioned.</li></ul> |6|
221
-
| <ul><li>The input data stream is partitioned by 3.</li><li>The query contains two steps. The input step is partitioned and the second step isn't.</li><li>The <strong>SELECT</strong> statement reads from the partitioned input.</li></ul> |24 (18 for partitioned steps + 6 for non-partitioned steps |
218
+
| <ul><li>The query contains one step.</li><li>The step isn't partitioned.</li></ul> |1 SU V2|
219
+
| <ul><li>The input data stream is partitioned by 16.</li><li>The query contains one step.</li><li>The step is partitioned.</li></ul> |16 SU V2 (1 * 16 partitions) |
220
+
| <ul><li>The query contains two steps.</li><li>Neither of the steps is partitioned.</li></ul> |1 SU V2|
221
+
| <ul><li>The input data stream is partitioned by 3.</li><li>The query contains two steps. The input step is partitioned and the second step isn't.</li><li>The <strong>SELECT</strong> statement reads from the partitioned input.</li></ul> |4 SU V2s (3 for partitioned steps + 1 for non-partitioned steps |
222
222
223
223
### Examples of scaling
224
224
225
-
The following query calculates the number of cars within a three-minute window going through a toll station that has three tollbooths. This query can be scaled up to six SUs.
225
+
The following query calculates the number of cars within a three-minute window going through a toll station that has three tollbooths. This query can be scaled up to one SU V2.
226
226
227
227
```SQL
228
228
SELECTCOUNT(*) AS Count, TollBoothId
229
229
FROM Input1
230
230
GROUP BY TumblingWindow(minute, 3), TollBoothId, PartitionId
231
231
```
232
232
233
-
To use more SUs for the query, both the input data stream and the query must be partitioned. Since the data stream partition is set to 3, the following modified query can be scaled up to 18 SUs:
233
+
To use more SUs for the query, both the input data stream and the query must be partitioned. Since the data stream partition is set to 3, the following modified query can be scaled up to 3 SU V2s:
234
234
235
235
```SQL
236
236
SELECTCOUNT(*) AS Count, TollBoothId
@@ -271,27 +271,27 @@ The following observations use a Stream Analytics job with stateless (passthroug
271
271
272
272
|Ingestion Rate (events per second) | Streaming Units | Output Resources |
273
273
|--------|---------|---------|
274
-
| 1 K | 1 | 2 TU |
275
-
| 5 K |6| 6 TU |
276
-
| 10 K |12| 10 TU |
274
+
| 1 K | 1/3| 2 TU |
275
+
| 5 K |1| 6 TU |
276
+
| 10 K |2| 10 TU |
277
277
278
-
The [Event Hubs](https://github.com/Azure-Samples/streaming-at-scale/tree/main/eventhubs-streamanalytics-eventhubs) solution scales linearly in terms of streaming units (SU) and throughput, making it the most efficient and performant way to analyze and stream data out of Stream Analytics. Jobs can be scaled up to 396 SU, which roughly translates to processing up to 400 MB/s, or 38 trillion events per day.
278
+
The [Event Hubs](https://github.com/Azure-Samples/streaming-at-scale/tree/main/eventhubs-streamanalytics-eventhubs) solution scales linearly in terms of streaming units (SU) and throughput, making it the most efficient and performant way to analyze and stream data out of Stream Analytics. Jobs can be scaled up to 66 SU V2s, which roughly translates to processing up to 400 MB/s, or 38 trillion events per day.
279
279
280
280
#### Azure SQL
281
281
|Ingestion Rate (events per second) | Streaming Units | Output Resources |
282
282
|---------|------|-------|
283
-
| 1 K | 3 | S3 |
284
-
| 5 K |18| P4 |
285
-
| 10 K |36| P6 |
283
+
| 1 K |2/3 | S3 |
284
+
| 5 K |3| P4 |
285
+
| 10 K |6| P6 |
286
286
287
287
[Azure SQL](https://github.com/Azure-Samples/streaming-at-scale/tree/main/eventhubs-streamanalytics-azuresql) supports writing in parallel, called Inherit Partitioning, but it's not enabled by default. However, enabling Inherit Partitioning, along with a fully parallel query, may not be sufficient to achieve higher throughputs. SQL write throughputs depend significantly on your database configuration and table schema. The [SQL Output Performance](./stream-analytics-sql-output-perf.md) article has more detail about the parameters that can maximize your write throughput. As noted in the [Azure Stream Analytics output to Azure SQL Database](./stream-analytics-sql-output-perf.md#azure-stream-analytics) article, this solution doesn't scale linearly as a fully parallel pipeline beyond 8 partitions and may need repartitioning before SQL output (see [INTO](/stream-analytics-query/into-azure-stream-analytics#into-shard-count)). Premium SKUs are needed to sustain high IO rates along with overhead from log backups happening every few minutes.
288
288
289
289
#### Azure Cosmos DB
290
290
|Ingestion Rate (events per second) | Streaming Units | Output Resources |
291
291
|-------|-------|---------|
292
-
| 1 K | 3 | 20K RU |
293
-
| 5 K |24| 60K RU |
294
-
| 10 K |48| 120K RU |
292
+
| 1 K |2/3 | 20K RU |
293
+
| 5 K |4| 60K RU |
294
+
| 10 K |8| 120K RU |
295
295
296
296
[Azure Cosmos DB](https://github.com/Azure-Samples/streaming-at-scale/tree/main/eventhubs-streamanalytics-cosmosdb) output from Stream Analytics has been updated to use native integration under [compatibility level 1.2](./stream-analytics-documentdb-output.md#improved-throughput-with-compatibility-level-12). Compatibility level 1.2 enables significantly higher throughput and reduces RU consumption compared to 1.1, which is the default compatibility level for new jobs. The solution uses Azure Cosmos DB containers partitioned on /deviceId and the rest of solution is identically configured.
0 commit comments