You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/concepts-integration-runtime-performance.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,13 +51,13 @@ A best practice is to start small and scale up to meet your performance needs.
51
51
52
52
## Custom shuffle partition
53
53
54
-
Dataflow divides the data into partitions and transforms it using different processes. If the data size in a partition is more than the process can hold in memory, the process fails with OOM(out of memory) errors. If dataflow contains huge amounts of data having joins/aggregations, you may want to try changing shuffle partitions in incremental way. You can set it from 50 up to 2000, to avoid OOM errors. **Compute Custom properties** in dataflow runtime, is a way to control your compute requirements. Property name is **Shuffle partitions** and it is integer type. This customization should only be used in known scenarios, otherwise it can cause unnecessary dataflow failures.
54
+
Dataflow divides the data into partitions and transforms it using different processes. If the data size in a partition is more than the process can hold in memory, the process fails with OOM(out of memory) errors. If dataflow contains huge amounts of data having joins/aggregations, you may want to try changing shuffle partitions in incremental way. You can set it from 50 up to 2000, to avoid OOM errors. **Compute Custom properties** in dataflow runtime, is a way to control your compute requirements. Property name is **Shuffle partitions** and it's integer type. This customization should only be used in known scenarios, otherwise it can cause unnecessary dataflow failures.
55
55
56
-
While increasing the shuffle partitions, make sure data is spread across well. A rough number is to have approximately 1.5 GB of data per partition, If data is skewed, increasing the "Shuffle partitions" will not be helpful. For example, if you have 500 GB of data, having a value between 400 to 500 should work. Default limit for shuffle partitions is 200 which works well for ~300 GB of data.
56
+
While increasing the shuffle partitions, make sure data is spread across well. A rough number is to have approximately 1.5 GB of data per partition. If data is skewed, increasing the "Shuffle partitions" won't be helpful. For example, if you have 500 GB of data, having a value between 400 to 500 should work. Default limit for shuffle partitions is 200 which works well for ~300 GB of data.
57
57
58
-
Here are the steps on how it is set in a custom integration runtime. You cannot set it for autoresolve integrtaion runtime.
58
+
Here are the steps on how it's set in a custom integration runtime. You can't set it for autoresolve integrtaion runtime.
59
59
60
-
1. From ADF portal under Manage, select a custom itegration run time and you will go to edit mode.
60
+
1. From ADF portal under Manage, select a custom itegration run time and you'll go to edit mode.
61
61
2. Under dataflow run time tab, go to Compute Cusotm Properties section.
62
62
3. Select Shuffle Partitions under Property name, input value of your choice, like 250, 500 etc.
0 commit comments