You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/stream-analytics/stream-analytics-parallelization.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ As a prerequisite, you may want to be familiar with the notion of Streaming Unit
15
15
## What are the parts of a Stream Analytics job?
16
16
A Stream Analytics job definition includes at least one streaming input, a query, and output. Inputs are where the job reads the data stream from. The query is used to transform the data input stream, and the output is where the job sends the job results to.
17
17
18
-
## Partitions in sources and sinks
18
+
## Partitions in inputs and outputs
19
19
Partitioning lets you divide data into subsets based on a [partition key](https://docs.microsoft.com/azure/event-hubs/event-hubs-scalability#partitions). If your input (for example Event Hubs) is partitioned by a key, it is highly recommended to specify this partition key when adding input to your Stream Analytics job. Scaling a Stream Analytics job takes advantage of partitions in the input and output. A Stream Analytics job can consume and write different partitions in parallel, which increases throughput.
20
20
21
21
### Inputs
@@ -122,7 +122,7 @@ If the input partition count doesn't match the output partition count, the topol
122
122
123
123
Power BI output doesn't currently support partitioning. Therefore, this scenario is not embarrassingly parallel.
124
124
125
-
### Multi-step query with different PARTITION BY values
125
+
### Multi-step query with different PARTITION BY values - Compatibility level 1.0 or 1.1
126
126
* Input: Event hub with 8 partitions
127
127
* Output: Event hub with 8 partitions
128
128
@@ -142,7 +142,7 @@ Query:
142
142
143
143
As you can see, the second step uses **TollBoothId** as the partitioning key. This step is not the same as the first step, and it therefore requires us to do a shuffle.
144
144
145
-
### Compatibility level 1.2 - Multi-step query with different PARTITION BY values
145
+
### Multi-step query with different PARTITION BY values - Compatibility level 1.2 or above
146
146
* Input: Event hub with 8 partitions
147
147
* Output: Event hub with 8 partitions ("Partition key column" must be set to use "TollBoothId")
148
148
@@ -160,7 +160,7 @@ Query:
160
160
GROUP BY TumblingWindow(minute, 3), TollBoothId
161
161
```
162
162
163
-
Compatibility level 1.2 enables parallel query execution by default. For example, query from the previous section will be partitioned as long as "TollBoothId" column is set as input Partition Key. PARTITION BY PartitionId clause is not required.
163
+
Compatibility level 1.2 or above enables parallel query execution by default. For example, query from the previous section will be partitioned as long as "TollBoothId" column is set as input Partition Key. PARTITION BY PartitionId clause is not required.
164
164
165
165
## Calculate the maximum streaming units of a job
166
166
The total number of streaming units that can be used by a Stream Analytics job depends on the number of steps in the query defined for the job and the number of partitions for each step.
0 commit comments