Skip to content

Commit ff49972

Browse files
authored
Merge pull request #109438 from kromerm/markadf001
Markadf001
2 parents 28bab53 + e687b76 commit ff49972

File tree

7 files changed

+15
-4
lines changed

7 files changed

+15
-4
lines changed

articles/data-factory/concepts-data-flow-performance.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,15 +64,15 @@ By default, turning on debug will use the default Azure Integration runtime that
6464

6565
Under **Source Options** in the source transformation, the following settings can affect performance:
6666

67-
* Batch size instructs ADF to store data in sets in memory instead of row-by-row. Batch size is an optional setting and you may run out of resources on the compute nodes if they aren't sized properly.
67+
* Batch size instructs ADF to store data in sets in Spark memory instead of row-by-row. Batch size is an optional setting and you may run out of resources on the compute nodes if they aren't sized properly. Not setting this property will utilize Spark caching batch defaults.
6868
* Setting a query can allow you to filter rows at the source before they arrive in Data Flow for processing. This can make the initial data acquisition faster. If you use a query, you can add optional query hints for your Azure SQL DB such as READ UNCOMMITTED.
6969
* Read uncommitted will provide faster query results on Source transformation
7070

7171
![Source](media/data-flow/source4.png "Source")
7272

7373
### Sink batch size
7474

75-
To avoid row-by-row processing of your data flows, set **Batch size** in the Settings tab for Azure SQL DB and Azure SQL DW sinks. If batch size is set, ADF processes database writes in batches based on the size provided.
75+
To avoid row-by-row processing of your data flows, set **Batch size** in the Settings tab for Azure SQL DB and Azure SQL DW sinks. If batch size is set, ADF processes database writes in batches based on the size provided. Not setting this property will utilize Spark caching batch defaults.
7676

7777
![Sink](media/data-flow/sink4.png "Sink")
7878

articles/data-factory/connector-azure-sql-data-warehouse.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -608,7 +608,7 @@ Using COPY statement supports the following configuration:
608608
2. Format settings are with the following:
609609
610610
1. For **Parquet**: `compression` can be **no compression**, **Snappy**, or **GZip**.
611-
2. For **ORC**: `compression` can be **no compression**, **zlib**, or **Snappy**.
611+
2. For **ORC**: `compression` can be **no compression**, **```zlib```**, or **Snappy**.
612612
3. For **Delimited text**:
613613
1. `rowDelimiter` is explicitly set as **single character** or "**\r\n**", the default value is not supported.
614614
2. `nullValue` is left as default or set to **empty string** ("").
@@ -700,7 +700,7 @@ Settings specific to Azure Synapse Analytics are available in the **Source Optio
700700

701701
* SQL Example: ```Select * from MyTable where customerId > 1000 and customerId < 2000```
702702

703-
**Batch size**: Enter a batch size to chunk large data into reads.
703+
**Batch size**: Enter a batch size to chunk large data into reads. In data flows, ADF will use this setting to set Spark columnar caching. This is an option field which will use Spark defaults if it is left blank.
704704

705705
**Isolation Level**: The default for SQL sources in mapping data flow is read uncommitted. You can change the isolation level here to one of these values:
706706
* Read Committed

articles/data-factory/data-flow-aggregate.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,17 @@ MoviesYear aggregate(
9191
) ~> AvgComedyRatingByYear
9292
```
9393

94+
![Aggregate data flow script](media/data-flow/aggdfs1.png "Aggregate data flow script")
95+
96+
```MoviesYear```: Derived Column defining year and title columns
97+
```AvgComedyRatingByYear```: Aggregate transformation for average rating of comedies grouped by year
98+
```avgrating```: Name of new column being created to hold the aggregated value
99+
100+
```
101+
MoviesYear aggregate(groupBy(year),
102+
avgrating = avg(toInteger(Rating))) ~> AvgComedyRatingByYear
103+
```
104+
94105
## Next steps
95106

96107
* Define window-based aggregation using the [Window transformation](data-flow-window.md)
28.7 KB
Loading
11.6 KB
Loading
14.1 KB
Loading
46.1 KB
Loading

0 commit comments

Comments
 (0)