Skip to content

Commit 406f24f

Browse files
authored
Update concepts-data-flow-performance.md
1 parent d26c3ba commit 406f24f

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

articles/data-factory/concepts-data-flow-performance.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,9 @@ By default, turning on debug will use the default Azure Integration runtime that
5454

5555
![Source Part](media/data-flow/sourcepart3.png "Source Part")
5656

57+
> [!NOTE]
58+
> A good guide to help you choose number of partitions for your source is based on the number of cores that you have set for your Azure Integration Runtime and multiply that number by five. So, for example, if you are transforming a series of files in your ADLS folders and you are going to utilize a 32-core Azure IR, the number of partitions you would target is 32 x 5 = 160 partitions.
59+
5760
### Source batch size, input, and isolation level
5861

5962
Under **Source Options** in the source transformation, the following settings can affect performance:
@@ -95,7 +98,7 @@ To avoid row-by-row inserts into your DW, check **Enable staging** in your Sink
9598

9699
At each transformation, you can set the partitioning scheme you wish data factory to use in the Optimize tab. It is a good practice to first test file-based sinks keeping the default partitioning and optimizations.
97100

98-
* For smaller files, you may find selecting *Single Partition* can sometimes work better and faster than asking Spark to partition your small files.
101+
* For smaller files, you may find choosing fewer partitions can sometimes work better and faster than asking Spark to partition your small files.
99102
* If you don't have enough information about your source data, choose *Round Robin* partitioning and set the number of partitions.
100103
* If your data has columns that can be good hash keys, choose *Hash partitioning*.
101104

0 commit comments

Comments
 (0)