Skip to content

Commit 2d8cad4

Browse files
committed
updating braodcast info
1 parent 5e7deee commit 2d8cad4

File tree

5 files changed

+19
-9
lines changed

5 files changed

+19
-9
lines changed

articles/data-factory/data-flow-exists.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,14 @@ To create a free-form expression that contains operators other than "and" and "e
3737

3838
![Exists custom settings](media/data-flow/exists1.png "exists custom")
3939

40+
## Broadcast optimization
41+
42+
![Broadcast Join](media/data-flow/broadcast.png "Broadcast Join")
43+
44+
In joins, lookups and exists transformation, if one or both data streams fit into worker node memory, you can optimize performance by enabling **Broadcasting**. By default, the spark engine will automatically decide whether or not to broadcast one side. To manually choose which side to broadcast, select **Fixed**.
45+
46+
It's not recommended to disable broadcasting via the **Off** option unless your joins are running into timeout errors.
47+
4048
## Data flow script
4149

4250
### Syntax
@@ -46,7 +54,7 @@ To create a free-form expression that contains operators other than "and" and "e
4654
exists(
4755
<conditionalExpression>,
4856
negate: { true | false },
49-
broadcast: {'none' | 'left' | 'right' | 'both'}
57+
broadcast: { 'auto' | 'left' | 'right' | 'both' | 'off' }
5058
) ~> <existsTransformationName>
5159
```
5260

@@ -65,7 +73,7 @@ NameNorm2, TypeConversions
6573
exists(
6674
NameNorm2@EmpID == TypeConversions@EmpID && NameNorm2@Region == DimEmployees@Region,
6775
negate:false,
68-
broadcast: 'none'
76+
broadcast: 'auto'
6977
) ~> checkForChanges
7078
```
7179

articles/data-factory/data-flow-join.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,9 @@ Unlike merge join in tools like SSIS, the join transformation isn't a mandatory
6464

6565
![Join Transformation optimize](media/data-flow/joinoptimize.png "Join Optimization")
6666

67-
If one or both of the data streams fit into worker node memory, further optimize your performance by enabling **Broadcast** in the optimize tab. You can also repartition your data on the join operation so that it fits better into memory per worker.
67+
In joins, lookups and exists transformation, if one or both data streams fit into worker node memory, you can optimize performance by enabling **Broadcasting**. By default, the spark engine will automatically decide whether or not to broadcast one side. To manually choose which side to broadcast, select **Fixed**.
68+
69+
It's not recommended to disable broadcasting via the **Off** option unless your joins are running into timeout errors.
6870

6971
## Self-Join
7072

@@ -85,7 +87,7 @@ When testing the join transformations with data preview in debug mode, use a sma
8587
join(
8688
<conditionalExpression>,
8789
joinType: { 'inner'> | 'outer' | 'left_outer' | 'right_outer' | 'cross' }
88-
broadcast: { 'none' | 'left' | 'right' | 'both' }
90+
broadcast: { 'auto' | 'left' | 'right' | 'both' | 'off' }
8991
) ~> <joinTransformationName>
9092
```
9193

articles/data-factory/data-flow-lookup.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -50,11 +50,11 @@ When testing the lookup transformation with data preview in debug mode, use a sm
5050

5151
## Broadcast optimization
5252

53-
In Azure Data Factory mapping data flows execute in scaled-out Spark environments. If your dataset can fit into worker node memory space, your lookup performance can be optimized by enabling broadcasting.
54-
5553
![Broadcast Join](media/data-flow/broadcast.png "Broadcast Join")
5654

57-
Enabling broadcasting pushes the entire dataset into memory. For smaller datasets containing only a few thousand rows, broadcasting can greatly improve your lookup performance. For large datasets, this option can lead to an out of memory exception.
55+
In joins, lookups and exists transformation, if one or both data streams fit into worker node memory, you can optimize performance by enabling **Broadcasting**. By default, the spark engine will automatically decide whether or not to broadcast one side. To manually choose which side to broadcast, select **Fixed**.
56+
57+
It's not recommended to disable broadcasting via the **Off** option unless your joins are running into timeout errors.
5858

5959
## Data flow script
6060

@@ -67,7 +67,7 @@ Enabling broadcasting pushes the entire dataset into memory. For smaller dataset
6767
multiple: { true | false },
6868
pickup: { 'first' | 'last' | 'any' }, ## Only required if false is selected for multiple
6969
{ desc | asc }( <sortColumn>, { true | false }), ## Only required if 'first' or 'last' is selected. true/false determines whether to put nulls first
70-
broadcast: { 'none' | 'left' | 'right' | 'both' }
70+
broadcast: { 'auto' | 'left' | 'right' | 'both' | 'off' }
7171
) ~> <lookupTransformationName>
7272
```
7373
### Example
@@ -81,7 +81,7 @@ SQLProducts, DimProd lookup(ProductID == ProductKey,
8181
multiple: false,
8282
pickup: 'first',
8383
asc(ProductKey, true),
84-
broadcast: 'none')~> LookupKeys
84+
broadcast: 'auto')~> LookupKeys
8585
```
8686
##
8787
Next steps
14.3 KB
Loading
29.8 KB
Loading

0 commit comments

Comments
 (0)