Skip to content

Commit 9b89d2e

Browse files
Merge pull request #243515 from anboisve/patch-6
Update stream-analytics-parallelization.md
2 parents 7fdc0a2 + 6c1aee5 commit 9b89d2e

File tree

1 file changed

+17
-17
lines changed

1 file changed

+17
-17
lines changed

articles/stream-analytics/stream-analytics-parallelization.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -210,27 +210,27 @@ Partitioning a step requires the following conditions:
210210
When a query is partitioned, the input events are processed and aggregated in separate partition groups, and outputs events are generated for each of the groups. If you want a combined aggregate, you must create a second non-partitioned step to aggregate.
211211

212212
### Calculate the max streaming units for a job
213-
All non-partitioned steps together can scale up to six streaming units (SUs) for a Stream Analytics job. In addition, you can add 6 SUs for each partition in a partitioned step.
213+
All non-partitioned steps together can scale up to one streaming unit (SU V2s) for a Stream Analytics job. In addition, you can add 1 SU V2 for each partition in a partitioned step.
214214
You can see some **examples** in the table below.
215215

216216
| Query | Max SUs for the job |
217217
| --------------------------------------------------- | ------------------- |
218-
| <ul><li>The query contains one step.</li><li>The step isn't partitioned.</li></ul> | 6 |
219-
| <ul><li>The input data stream is partitioned by 16.</li><li>The query contains one step.</li><li>The step is partitioned.</li></ul> | 96 (6 * 16 partitions) |
220-
| <ul><li>The query contains two steps.</li><li>Neither of the steps is partitioned.</li></ul> | 6 |
221-
| <ul><li>The input data stream is partitioned by 3.</li><li>The query contains two steps. The input step is partitioned and the second step isn't.</li><li>The <strong>SELECT</strong> statement reads from the partitioned input.</li></ul> | 24 (18 for partitioned steps + 6 for non-partitioned steps |
218+
| <ul><li>The query contains one step.</li><li>The step isn't partitioned.</li></ul> | 1 SU V2 |
219+
| <ul><li>The input data stream is partitioned by 16.</li><li>The query contains one step.</li><li>The step is partitioned.</li></ul> | 16 SU V2 (1 * 16 partitions) |
220+
| <ul><li>The query contains two steps.</li><li>Neither of the steps is partitioned.</li></ul> | 1 SU V2 |
221+
| <ul><li>The input data stream is partitioned by 3.</li><li>The query contains two steps. The input step is partitioned and the second step isn't.</li><li>The <strong>SELECT</strong> statement reads from the partitioned input.</li></ul> | 4 SU V2s (3 for partitioned steps + 1 for non-partitioned steps |
222222

223223
### Examples of scaling
224224

225-
The following query calculates the number of cars within a three-minute window going through a toll station that has three tollbooths. This query can be scaled up to six SUs.
225+
The following query calculates the number of cars within a three-minute window going through a toll station that has three tollbooths. This query can be scaled up to one SU V2.
226226

227227
```SQL
228228
SELECT COUNT(*) AS Count, TollBoothId
229229
FROM Input1
230230
GROUP BY TumblingWindow(minute, 3), TollBoothId, PartitionId
231231
```
232232

233-
To use more SUs for the query, both the input data stream and the query must be partitioned. Since the data stream partition is set to 3, the following modified query can be scaled up to 18 SUs:
233+
To use more SUs for the query, both the input data stream and the query must be partitioned. Since the data stream partition is set to 3, the following modified query can be scaled up to 3 SU V2s:
234234

235235
```SQL
236236
SELECT COUNT(*) AS Count, TollBoothId
@@ -271,27 +271,27 @@ The following observations use a Stream Analytics job with stateless (passthroug
271271

272272
|Ingestion Rate (events per second) | Streaming Units | Output Resources |
273273
|--------|---------|---------|
274-
| 1 K | 1 | 2 TU |
275-
| 5 K | 6 | 6 TU |
276-
| 10 K | 12 | 10 TU |
274+
| 1 K | 1/3 | 2 TU |
275+
| 5 K | 1 | 6 TU |
276+
| 10 K | 2 | 10 TU |
277277

278-
The [Event Hubs](https://github.com/Azure-Samples/streaming-at-scale/tree/main/eventhubs-streamanalytics-eventhubs) solution scales linearly in terms of streaming units (SU) and throughput, making it the most efficient and performant way to analyze and stream data out of Stream Analytics. Jobs can be scaled up to 396 SU, which roughly translates to processing up to 400 MB/s, or 38 trillion events per day.
278+
The [Event Hubs](https://github.com/Azure-Samples/streaming-at-scale/tree/main/eventhubs-streamanalytics-eventhubs) solution scales linearly in terms of streaming units (SU) and throughput, making it the most efficient and performant way to analyze and stream data out of Stream Analytics. Jobs can be scaled up to 66 SU V2s, which roughly translates to processing up to 400 MB/s, or 38 trillion events per day.
279279

280280
#### Azure SQL
281281
|Ingestion Rate (events per second) | Streaming Units | Output Resources |
282282
|---------|------|-------|
283-
| 1 K | 3 | S3 |
284-
| 5 K | 18 | P4 |
285-
| 10 K | 36 | P6 |
283+
| 1 K | 2/3 | S3 |
284+
| 5 K | 3 | P4 |
285+
| 10 K | 6 | P6 |
286286

287287
[Azure SQL](https://github.com/Azure-Samples/streaming-at-scale/tree/main/eventhubs-streamanalytics-azuresql) supports writing in parallel, called Inherit Partitioning, but it's not enabled by default. However, enabling Inherit Partitioning, along with a fully parallel query, may not be sufficient to achieve higher throughputs. SQL write throughputs depend significantly on your database configuration and table schema. The [SQL Output Performance](./stream-analytics-sql-output-perf.md) article has more detail about the parameters that can maximize your write throughput. As noted in the [Azure Stream Analytics output to Azure SQL Database](./stream-analytics-sql-output-perf.md#azure-stream-analytics) article, this solution doesn't scale linearly as a fully parallel pipeline beyond 8 partitions and may need repartitioning before SQL output (see [INTO](/stream-analytics-query/into-azure-stream-analytics#into-shard-count)). Premium SKUs are needed to sustain high IO rates along with overhead from log backups happening every few minutes.
288288

289289
#### Azure Cosmos DB
290290
|Ingestion Rate (events per second) | Streaming Units | Output Resources |
291291
|-------|-------|---------|
292-
| 1 K | 3 | 20K RU |
293-
| 5 K | 24 | 60K RU |
294-
| 10 K | 48 | 120K RU |
292+
| 1 K | 2/3 | 20K RU |
293+
| 5 K | 4 | 60K RU |
294+
| 10 K | 8 | 120K RU |
295295

296296
[Azure Cosmos DB](https://github.com/Azure-Samples/streaming-at-scale/tree/main/eventhubs-streamanalytics-cosmosdb) output from Stream Analytics has been updated to use native integration under [compatibility level 1.2](./stream-analytics-documentdb-output.md#improved-throughput-with-compatibility-level-12). Compatibility level 1.2 enables significantly higher throughput and reduces RU consumption compared to 1.1, which is the default compatibility level for new jobs. The solution uses Azure Cosmos DB containers partitioned on /deviceId and the rest of solution is identically configured.
297297

0 commit comments

Comments
 (0)