Skip to content

Commit 1a3aee1

Browse files
authored
Merge pull request #106361 from kromerm/dataflow-1
Added data flow details
2 parents 3c6d5d5 + 10fc450 commit 1a3aee1

File tree

2 files changed

+11
-3
lines changed

2 files changed

+11
-3
lines changed

articles/data-factory/concepts-data-flow-performance.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,9 @@ By default, turning on debug will use the default Azure Integration runtime that
5454

5555
![Source Part](media/data-flow/sourcepart3.png "Source Part")
5656

57+
> [!NOTE]
58+
> A good guide to help you choose number of partitions for your source is based on the number of cores that you have set for your Azure Integration Runtime and multiply that number by five. So, for example, if you are transforming a series of files in your ADLS folders and you are going to utilize a 32-core Azure IR, the number of partitions you would target is 32 x 5 = 160 partitions.
59+
5760
### Source batch size, input, and isolation level
5861

5962
Under **Source Options** in the source transformation, the following settings can affect performance:
@@ -95,7 +98,7 @@ To avoid row-by-row inserts into your DW, check **Enable staging** in your Sink
9598

9699
At each transformation, you can set the partitioning scheme you wish data factory to use in the Optimize tab. It is a good practice to first test file-based sinks keeping the default partitioning and optimizations.
97100

98-
* For smaller files, you may find selecting *Single Partition* can sometimes work better and faster than asking Spark to partition your small files.
101+
* For smaller files, you may find choosing fewer partitions can sometimes work better and faster than asking Spark to partition your small files.
99102
* If you don't have enough information about your source data, choose *Round Robin* partitioning and set the number of partitions.
100103
* If your data has columns that can be good hash keys, choose *Hash partitioning*.
101104

articles/data-factory/format-avro.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ ms.reviewer: craigg
88
ms.service: data-factory
99
ms.workload: data-services
1010
ms.topic: conceptual
11-
ms.date: 02/13/2020
11+
ms.date: 03/03/2020
12+
1213
ms.author: jingwang
1314

1415
---
@@ -80,7 +81,11 @@ The following properties are supported in the copy activity ***\*sink\**** secti
8081

8182
## Data type support
8283

83-
Avro [complex data types](https://avro.apache.org/docs/current/spec.html#schema_complex) are not supported (records, enums, arrays, maps, unions, and fixed).
84+
### Copy activity
85+
Avro [complex data types](https://avro.apache.org/docs/current/spec.html#schema_complex) are not supported (records, enums, arrays, maps, unions, and fixed) in Copy Activity.
86+
87+
### Data flows
88+
When working with Avro files in data flows, you can read and write complex data types, but be sure to clear the physical schema from the dataset first. In data flows, you can set your logical projection and derive columns that are complex structures, then auto-map those fields to an Avro file.
8489

8590
## Next steps
8691

0 commit comments

Comments
 (0)