Skip to content

Commit 45c2c9b

Browse files
Merge pull request #42384 from xindzhan/patch-1
Output event hub partitionKey column
2 parents 647118b + 38ee64d commit 45c2c9b

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

articles/stream-analytics/stream-analytics-parallelization.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ An *embarrassingly parallel* job is the most scalable scenario we have in Azure
5454

5555
1. If your query logic depends on the same key being processed by the same query instance, you must make sure that the events go to the same partition of your input. For Event Hubs or IoT Hub, this means that the event data must have the **PartitionKey** value set. Alternatively, you can use partitioned senders. For blob storage, this means that the events are sent to the same partition folder. If your query logic does not require the same key to be processed by the same query instance, you can ignore this requirement. An example of this logic would be a simple select-project-filter query.
5656

57-
2. Once the data is laid out on the input side, you must make sure that your query is partitioned. This requires you to use **PARTITION BY** in all the steps. Multiple steps are allowed, but they all must be partitioned by the same key. Under compatibility level 1.0 and 1.1, the partitioning key must be set to **PartitionId** in order for the job to be fully parallel. For jobs with compatility level 1.2 and higher, custom column can be specified as Partition Key in the input settings and the job will be paralellized automoatically even without PARTITION BY clause.
57+
2. Once the data is laid out on the input side, you must make sure that your query is partitioned. This requires you to use **PARTITION BY** in all the steps. Multiple steps are allowed, but they all must be partitioned by the same key. Under compatibility level 1.0 and 1.1, the partitioning key must be set to **PartitionId** in order for the job to be fully parallel. For jobs with compatility level 1.2 and higher, custom column can be specified as Partition Key in the input settings and the job will be paralellized automatically even without PARTITION BY clause. For event hub output the property "Partition key column" must be set to use "PartitionId".
5858

5959
3. Most of our output can take advantage of partitioning, however if you use an output type that doesn't support partitioning your job won't be fully parallel. Refer to the [output section](#outputs) for more details.
6060

@@ -71,7 +71,7 @@ The following sections discuss some example scenarios that are embarrassingly pa
7171
### Simple query
7272

7373
* Input: Event hub with 8 partitions
74-
* Output: Event hub with 8 partitions
74+
* Output: Event hub with 8 partitions ("Partition key column" must be set to use "PartitionId")
7575

7676
Query:
7777

@@ -138,7 +138,7 @@ The preceding examples show some Stream Analytics jobs that conform to (or don't
138138

139139
### Compatibility level 1.2 - Multi-step query with different PARTITION BY values
140140
* Input: Event hub with 8 partitions
141-
* Output: Event hub with 8 partitions
141+
* Output: Event hub with 8 partitions ("Partition key column" must be set to use "TollBoothId")
142142

143143
Query:
144144

0 commit comments

Comments
 (0)