MicrosoftDocs
diff --git a/‎articles/data-factory/concepts-data-flow-performance.md
Lines changed: 2 additions & 2 deletions b/‎articles/data-factory/concepts-data-flow-performance.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎articles/data-factory/connector-azure-sql-data-warehouse.md
Lines changed: 2 additions & 2 deletions b/‎articles/data-factory/connector-azure-sql-data-warehouse.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎articles/data-factory/data-flow-aggregate.md
Lines changed: 11 additions & 0 deletions b/‎articles/data-factory/data-flow-aggregate.md
Lines changed: 11 additions & 0 deletions
diff --git a/‎articles/data-factory/media/data-flow/aggdfs1.png
28.7 KB b/‎articles/data-factory/media/data-flow/aggdfs1.png
28.7 KB
diff --git a/‎articles/data-factory/media/data-flow/downloadsupportfiles.png
11.6 KB b/‎articles/data-factory/media/data-flow/downloadsupportfiles.png
11.6 KB
diff --git a/‎articles/data-factory/media/data-flow/lookup333.png
14.1 KB b/‎articles/data-factory/media/data-flow/lookup333.png
14.1 KB
diff --git a/‎articles/data-factory/media/data-flow/lookup444.png
46.1 KB b/‎articles/data-factory/media/data-flow/lookup444.png
46.1 KB
@@ -64,15 +64,15 @@ By default, turning on debug will use the default Azure Integration runtime that
 
 Under **Source Options** in the source transformation, the following settings can affect performance:
 
-* Batch size instructs ADF to store data in sets in memory instead of row-by-row. Batch size is an optional setting and you may run out of resources on the compute nodes if they aren't sized properly.
+* Batch size instructs ADF to store data in sets in Spark memory instead of row-by-row. Batch size is an optional setting and you may run out of resources on the compute nodes if they aren't sized properly. Not setting this property will utilize Spark caching batch defaults.
 * Setting a query can allow you to filter rows at the source before they arrive in Data Flow for processing. This can make the initial data acquisition faster. If you use a query, you can add optional query hints for your Azure SQL DB such as READ UNCOMMITTED.
 * Read uncommitted will provide faster query results on Source transformation
 
 ![Source](media/data-flow/source4.png "Source")
 
 ### Sink batch size
 
-To avoid row-by-row processing of your data flows, set **Batch size** in the Settings tab for Azure SQL DB and Azure SQL DW sinks. If batch size is set, ADF processes database writes in batches based on the size provided.
+To avoid row-by-row processing of your data flows, set **Batch size** in the Settings tab for Azure SQL DB and Azure SQL DW sinks. If batch size is set, ADF processes database writes in batches based on the size provided. Not setting this property will utilize Spark caching batch defaults.
 
 ![Sink](media/data-flow/sink4.png "Sink")
 
 
@@ -608,7 +608,7 @@ Using COPY statement supports the following configuration:
 2. Format settings are with the following:
 
    1. For **Parquet**: `compression` can be **no compression**, **Snappy**, or **GZip**.
-   2. For **ORC**: `compression` can be **no compression**, **zlib**, or **Snappy**.
+   2. For **ORC**: `compression` can be **no compression**, **```zlib```**, or **Snappy**.
    3. For **Delimited text**:
       1. `rowDelimiter` is explicitly set as **single character** or "**\r\n**", the default value is not supported.
       2. `nullValue` is left as default or set to **empty string** ("").
@@ -700,7 +700,7 @@ Settings specific to Azure Synapse Analytics are available in the **Source Optio
 
 * SQL Example: ```Select * from MyTable where customerId > 1000 and customerId < 2000```
 
-**Batch size**: Enter a batch size to chunk large data into reads.
+**Batch size**: Enter a batch size to chunk large data into reads. In data flows, ADF will use this setting to set Spark columnar caching. This is an option field which will use Spark defaults if it is left blank.
 
 **Isolation Level**: The default for SQL sources in mapping data flow is read uncommitted. You can change the isolation level here to one of these values:
 * Read Committed
 
@@ -91,6 +91,17 @@ MoviesYear aggregate(
             ) ~> AvgComedyRatingByYear
 ```
 
+![Aggregate data flow script](media/data-flow/aggdfs1.png "Aggregate data flow script")
+
+```MoviesYear```: Derived Column defining year and title columns
+```AvgComedyRatingByYear```: Aggregate transformation for average rating of comedies grouped by year
+```avgrating```: Name of new column being created to hold the aggregated value
+
+```
+MoviesYear aggregate(groupBy(year),
+	avgrating = avg(toInteger(Rating))) ~> AvgComedyRatingByYear
+```
+
 ## Next steps
 
 * Define window-based aggregation using the [Window transformation](data-flow-window.md)