You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In joins, lookups and exists transformation, if one or both data streams fit into worker node memory, you can optimize performance by enabling **Broadcasting**. By default, the spark engine will automatically decide whether or not to broadcast one side. To manually choose which side to broadcast, select **Fixed**.
45
+
46
+
It's not recommended to disable broadcasting via the **Off** option unless your joins are running into timeout errors.
47
+
40
48
## Data flow script
41
49
42
50
### Syntax
@@ -46,7 +54,7 @@ To create a free-form expression that contains operators other than "and" and "e
If one or both of the data streams fit into worker node memory, further optimize your performance by enabling **Broadcast** in the optimize tab. You can also repartition your data on the join operation so that it fits better into memory per worker.
67
+
In joins, lookups and exists transformation, if one or both data streams fit into worker node memory, you can optimize performance by enabling **Broadcasting**. By default, the spark engine will automatically decide whether or not to broadcast one side. To manually choose which side to broadcast, select **Fixed**.
68
+
69
+
It's not recommended to disable broadcasting via the **Off** option unless your joins are running into timeout errors.
68
70
69
71
## Self-Join
70
72
@@ -85,7 +87,7 @@ When testing the join transformations with data preview in debug mode, use a sma
Copy file name to clipboardExpand all lines: articles/data-factory/data-flow-lookup.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,11 +50,11 @@ When testing the lookup transformation with data preview in debug mode, use a sm
50
50
51
51
## Broadcast optimization
52
52
53
-
In Azure Data Factory mapping data flows execute in scaled-out Spark environments. If your dataset can fit into worker node memory space, your lookup performance can be optimized by enabling broadcasting.
Enabling broadcasting pushes the entire dataset into memory. For smaller datasets containing only a few thousand rows, broadcasting can greatly improve your lookup performance. For large datasets, this option can lead to an out of memory exception.
55
+
In joins, lookups and exists transformation, if one or both data streams fit into worker node memory, you can optimize performance by enabling **Broadcasting**. By default, the spark engine will automatically decide whether or not to broadcast one side. To manually choose which side to broadcast, select **Fixed**.
56
+
57
+
It's not recommended to disable broadcasting via the **Off** option unless your joins are running into timeout errors.
58
58
59
59
## Data flow script
60
60
@@ -67,7 +67,7 @@ Enabling broadcasting pushes the entire dataset into memory. For smaller dataset
67
67
multiple: { true | false },
68
68
pickup: { 'first' | 'last' | 'any' }, ## Only required if false is selected for multiple
69
69
{ desc | asc }( <sortColumn>, { true | false }), ## Only required if 'first' or 'last' is selected. true/false determines whether to put nulls first
0 commit comments