You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/stream-analytics/stream-analytics-scale-jobs.md
+35-29Lines changed: 35 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,38 +5,42 @@ ms.service: stream-analytics
5
5
author: ahartoon
6
6
ms.author: anboisve
7
7
ms.topic: conceptual
8
-
ms.date: 06/22/2017
8
+
ms.date: 02/27/2024
9
9
---
10
10
# Scale an Azure Stream Analytics job to increase throughput
11
11
This article shows you how to tune a Stream Analytics query to increase throughput for Streaming Analytics jobs. You can use the following guide to scale your job to handle higher load and take advantage of more system resources (such as more bandwidth, more CPU resources, more memory).
12
-
As a prerequisite, you may need to read the following articles:
13
-
-[Understand and adjust Streaming Units](stream-analytics-streaming-unit-consumption.md)
## Case 1 – Your query is inherently fully parallelizable across input partitions
17
19
If your query is inherently fully parallelizable across input partitions, you can follow the following steps:
18
-
1. Author your query to be embarrassingly parallel by using **PARTITION BY** keyword. See more details in the Embarrassingly parallel jobs section [on this page](stream-analytics-parallelization.md).
19
-
2. Depending on output types used in your query, some output may either be not parallelizable, or need further configuration to be embarrassingly parallel. For example, Power BI output isn't parallelizable. Outputs are always merged before sending to the output sink. Blobs, Tables, ADLS, Service Bus, and Azure Function are automatically parallelized. SQL and Azure Synapse Analytics outputs have an option for parallelization. Event Hubs need to have the PartitionKey configuration set to match with the **PARTITION BY** field (usually PartitionId). For Event Hubs, also pay extra attention to match the number of partitions for all inputs and all outputs to avoid cross-over between partitions.
20
-
3. Run your query with **1 SU V2** (which is the full capacity of a single computing node) to measure maximum achievable throughput, and if you're using **GROUP BY**, measure how many groups (cardinality) the job can handle. General symptoms of the job hitting system resource limits are the following.
21
-
- SU % utilization metric is over 80%. This indicates memory usage is high. The factors contributing to the increase of this metric are described [here](stream-analytics-streaming-unit-consumption.md).
22
-
- Output timestamp is falling behind with respect to wall clock time. Depending on your query logic, the output timestamp may have a logic offset from the wall clock time. However, they should progress at roughly the same rate. If the output timestamp is falling further and further behind, it’s an indicator that the system is overworking. It can be a result of downstream output sink throttling, or high CPU utilization. We don’t provide CPU utilization metric at this time, so it can be difficult to differentiate the two.
23
-
- If the issue is due to sink throttling, you may need to increase the number of output partitions (and also input partitions to keep the job fully parallelizable), or increase the amount of resources of the sink (for example number of Request Units for Cosmos DB).
24
-
- In job diagram, there's a per partition backlog event metric for each input. If the backlog event metric keeps increasing, it’s also an indicator that the system resource is constrained (either because of output sink throttling, or high CPU).
25
-
4. Once you have determined the limits of what a 1 SU V2 job can reach, you can extrapolate linearly the processing capacity of the job as you add more SUs, assuming you don’t have any data skew that makes certain partition "hot."
20
+
21
+
- Author your query to be embarrassingly parallel by using **PARTITION BY** keyword. For more information, see [Use query parallelization in Azure Stream Analytics](stream-analytics-parallelization.md).
22
+
- Depending on output types used in your query, some output can be either not parallelizable, or need further configuration to be embarrassingly parallel. For example, Power BI output isn't parallelizable. Outputs are always merged before sending to the output sink. Blobs, Tables, Azure Data Lake Storage, Service Bus, and Azure Function are automatically parallelized. SQL and Azure Synapse Analytics outputs have an option for parallelization. An event hub needs to have the PartitionKey configuration set to match the **PARTITION BY** field (usually `PartitionId`). For Event Hubs, also pay extra attention to match the number of partitions for all inputs and all outputs to avoid cross-over between partitions.
23
+
- Run your query with **1 streaming unit (SU) V2** (which is the full capacity of a single computing node) to measure maximum achievable throughput, and if you're using **GROUP BY**, measure how many groups (cardinality) the job can handle. General symptoms of the job hitting system resource limits are the following.
24
+
- Stream unit (SU) % utilization metric is over 80%. It indicates memory usage is high. The factors contributing to the increase of this metric are described [Understand and adjust Stream Analytics streaming units](stream-analytics-streaming-unit-consumption.md).
25
+
- Output timestamp is falling behind with respect to wall clock time. Depending on your query logic, the output timestamp can have a logic offset from the wall clock time. However, they should progress at roughly the same rate. If the output timestamp is falling further and further behind, it’s an indicator that the system is overworking. It can be a result of downstream output sink throttling, or high CPU utilization. We don’t provide CPU utilization metric at this time, so it can be difficult to differentiate the two.
26
+
- If the issue is due to sink throttling, you need to increase the number of output partitions (and also input partitions to keep the job fully parallelizable), or increase the amount of resources of the sink (for example number of Request Units for Cosmos DB).
27
+
- In the job diagram, there's a per partition backlog event metric for each input. If the backlog event metric keeps increasing, it’s also an indicator that the system resource is constrained (either because of output sink throttling, or high CPU).
28
+
- Once you have determined the limits of what a one SU V2 job can reach, you can extrapolate linearly the processing capacity of the job as you add more SUs, assuming you don’t have any data skew that makes certain partition "hot."
26
29
27
30
> [!NOTE]
28
-
> Choose the right number of Streaming Units:
31
+
> Choose the right number of streaming units:
29
32
> Because Stream Analytics creates a processing node for each 1 SU V2 added, it’s best to make the number of nodes a divisor of the number of input partitions, so the partitions can be evenly distributed across the nodes.
30
33
For example, you have measured your 1 SU V2 job can achieve 4 MB/s processing rate, and your input partition count is 4. You can choose to run your job with 2 SU V2s to achieve roughly 8 MB/s processing rate, or 4 SU V2s to achieve 16 MB/s. You can then decide when to increase SU number for the job to what value, as a function of your input rate.
31
34
32
35
33
36
## Case 2 - If your query isn't embarrassingly parallel.
34
-
If your query isn't embarrassingly parallel, you can follow the following steps.
35
-
1. Start with a query with no **PARTITION BY** first to avoid partitioning complexity, and run your query with 1 SU V2 to measure maximum load as in [Case 1](#case-1--your-query-is-inherently-fully-parallelizable-across-input-partitions).
36
-
2. If you can achieve your anticipated load in term of throughput, you're done. Alternatively, you may choose to measure the same job running with fractional nodes at 2/3 SU V2 and 1/3 SU V2, to find out the minimum number of streaming units that works for your scenario.
37
-
3. If you can’t achieve the desired throughput, try to break your query into multiple steps if possible if it doesn’t have multiple steps already, and allocate up to 1 SU V2 for each step in the query. For example if you have 3 steps, allocate 3 SU V2s in the "Scale" option.
38
-
4. When running such a job, Stream Analytics puts each step on its own node with dedicated 1 SU V2 resource.
39
-
5. If you still haven’t achieved your load target, you can attempt to use **PARTITION BY** starting from steps closer to the input. For **GROUP BY** operator that may not be naturally partitionable, you can use the local/global aggregate pattern to perform a partitioned **GROUP BY** followed by a non-partitioned **GROUP BY**. For example, if you want to count how many cars going through each toll booth every 3 minutes, and the volume of the data is beyond what can be handled by 1 SU V2.
37
+
If your query isn't embarrassingly parallel, you can follow these steps.
38
+
39
+
- Start with a query with no **PARTITION BY** first to avoid partitioning complexity, and run your query with 1 SU V2 to measure maximum load as in [Case 1](#case-1--your-query-is-inherently-fully-parallelizable-across-input-partitions).
40
+
- If you can achieve your anticipated load in term of throughput, you're done. Alternatively, you can choose to measure the same job running with fractional nodes at 2/3 SU V2 and 1/3 SU V2, to find out the minimum number of streaming units that works for your scenario.
41
+
- If you can’t achieve the desired throughput, try to break your query into multiple steps if it doesn’t have multiple steps already, and allocate up to one SU V2 for each step in the query. For example if you have three steps, allocate three SU V2s in the "Scale" option.
42
+
- To run such a job, Stream Analytics puts each step on its own node with dedicated one SU V2 resource.
43
+
- If you still haven’t achieved your load target, you can attempt to use **PARTITION BY** starting from steps closer to the input. For **GROUP BY** operator that isn't naturally partitionable, you can use the local/global aggregate pattern to perform a partitioned **GROUP BY** followed by a nonpartitioned **GROUP BY**. For example, if you want to count how many cars going through each toll booth every 3 minutes, and the volume of the data is beyond what can be handled by one SU V2.
40
44
41
45
Query:
42
46
@@ -50,21 +54,23 @@ Query:
50
54
FROM Step1
51
55
GROUP BY TumblingWindow(minute, 3), TollBoothId
52
56
```
53
-
In the query above, you're counting cars per toll booth per partition, and then adding the count from all partitions together.
57
+
In the query, you're counting cars per toll booth per partition, and then adding the count from all partitions together.
54
58
55
-
Once partitioned, for each partition of the step, allocate 1 SU V2 so each partition can be placed on its own processing node.
59
+
Once partitioned, for each partition of the step, allocate one SU V2 so each partition can be placed on its own processing node.
56
60
57
61
> [!Note]
58
-
> If your query cannot be partitioned, adding additional SU in a multi-steps query may not always improve throughput. One way to gain performance is to reduce volume on the initial steps using local/global aggregate pattern, as described above in step 5.
62
+
> If your query cannot be partitioned, adding additional SU in a multi-steps query may not always improve throughput. One way to gain performance is to reduce volume on the initial steps using local/global aggregate pattern, as described in the step 5.
59
63
60
64
## Case 3 - You're running lots of independent queries in a job.
61
-
For certain ISV use cases, where it’s more cost-efficient to process data from multiple tenants in a single job, using separate inputs and outputs for each tenant, you may end up running quite a few (for example 20) independent queries in a single job. The assumption is each such subquery’s load is relatively small.
62
-
In this case, you can follow the following steps.
63
-
1. In this case, don't use **PARTITION BY** in the query
64
-
2. Reduce the input partition count to the lowest possible value of 2 if you're using Event Hubs.
65
-
3. Run the query with 1 SU V2. With expected load for each subquery, add as many such subqueries as possible, until the job is hitting system resource limits. Refer to [Case 1](#case-1--your-query-is-inherently-fully-parallelizable-across-input-partitions) for the symptoms when this happens.
66
-
4. Once you're hitting the subquery limit measured above, start adding the subquery to a new job. The number of jobs to run as a function of the number of independent queries should be fairly linear, assuming you don’t have any load skew. You can then forecast how many 1 SU V2 jobs you need to run as a function of the number of tenants you would like to serve.
67
-
5. When using reference data join with such queries, union the inputs together before joining with the same reference data. Then, split out the events if necessary. Otherwise, each reference data join keeps a copy of reference data in memory, likely blowing up the memory usage unnecessarily.
65
+
For certain ISV use cases, where it’s more cost-efficient to process data from multiple tenants in a single job, using separate inputs and outputs for each tenant, you end up running quite a few (for example 20) independent queries in a single job. The assumption is each such subquery’s load is relatively small.
66
+
67
+
In this case, you can follow these steps.
68
+
69
+
- In this case, don't use **PARTITION BY** in the query
70
+
- Reduce the input partition count to the lowest possible value of 2 if you're using Event Hubs.
71
+
- Run the query with one SU V2. With expected load for each subquery, add as many such subqueries as possible, until the job is hitting system resource limits. Refer to [Case 1](#case-1--your-query-is-inherently-fully-parallelizable-across-input-partitions) for the symptoms when it happens.
72
+
- Once you're hitting the subquery limit measured, start adding the subquery to a new job. The number of jobs to run as a function of the number of independent queries should be fairly linear, assuming you don’t have any load skew. You can then forecast how many SU V2 jobs you need to run as a function of the number of tenants you would like to serve.
73
+
- When using reference data join with such queries, union the inputs together before joining with the same reference data. Then, split out the events if necessary. Otherwise, each reference data join keeps a copy of reference data in memory, likely blowing up the memory usage unnecessarily.
0 commit comments