Skip to content

Commit bac7c38

Browse files
authored
Update stream-analytics-parallelization.md
1 parent 3a4a3e7 commit bac7c38

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

articles/stream-analytics/stream-analytics-parallelization.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ As a prerequisite, you may want to be familiar with the notion of Streaming Unit
1515
## What are the parts of a Stream Analytics job?
1616
A Stream Analytics job definition includes at least one streaming input, a query, and output. Inputs are where the job reads the data stream from. The query is used to transform the data input stream, and the output is where the job sends the job results to.
1717

18-
## Partitions in sources and sinks
18+
## Partitions in inputs and outputs
1919
Partitioning lets you divide data into subsets based on a [partition key](https://docs.microsoft.com/azure/event-hubs/event-hubs-scalability#partitions). If your input (for example Event Hubs) is partitioned by a key, it is highly recommended to specify this partition key when adding input to your Stream Analytics job. Scaling a Stream Analytics job takes advantage of partitions in the input and output. A Stream Analytics job can consume and write different partitions in parallel, which increases throughput.
2020

2121
### Inputs
@@ -122,7 +122,7 @@ If the input partition count doesn't match the output partition count, the topol
122122

123123
Power BI output doesn't currently support partitioning. Therefore, this scenario is not embarrassingly parallel.
124124

125-
### Multi-step query with different PARTITION BY values
125+
### Multi-step query with different PARTITION BY values - Compatibility level 1.0 or 1.1
126126
* Input: Event hub with 8 partitions
127127
* Output: Event hub with 8 partitions
128128

@@ -142,7 +142,7 @@ Query:
142142

143143
As you can see, the second step uses **TollBoothId** as the partitioning key. This step is not the same as the first step, and it therefore requires us to do a shuffle.
144144

145-
### Compatibility level 1.2 - Multi-step query with different PARTITION BY values
145+
### Multi-step query with different PARTITION BY values - Compatibility level 1.2 or above
146146
* Input: Event hub with 8 partitions
147147
* Output: Event hub with 8 partitions ("Partition key column" must be set to use "TollBoothId")
148148

@@ -160,7 +160,7 @@ Query:
160160
GROUP BY TumblingWindow(minute, 3), TollBoothId
161161
```
162162

163-
Compatibility level 1.2 enables parallel query execution by default. For example, query from the previous section will be partitioned as long as "TollBoothId" column is set as input Partition Key. PARTITION BY PartitionId clause is not required.
163+
Compatibility level 1.2 or above enables parallel query execution by default. For example, query from the previous section will be partitioned as long as "TollBoothId" column is set as input Partition Key. PARTITION BY PartitionId clause is not required.
164164

165165
## Calculate the maximum streaming units of a job
166166
The total number of streaming units that can be used by a Stream Analytics job depends on the number of steps in the query defined for the job and the number of partitions for each step.

0 commit comments

Comments
 (0)