Merge pull request #203766 from kromerm/deltaupdates

PMEds28 · web-flow · commit 23cdc2302cd5 · 2022-07-06T08:15:51.000+01:00
Deltaupdates
diff --git a/articles/data-factory/TOC.yml b/articles/data-factory/TOC.yml
@@ -730,7 +730,7 @@ items:
       href: data-flow-alter-row.md
       displayName: upsert, update, insert, delete
     - name: Assert
-      href: data-flow-assert.md
+      href: data-flow-assert.md   
     - name: Conditional split
       href: data-flow-conditional-split.md
       displayName: split
diff --git a/articles/data-factory/concepts-data-flow-performance-transformations.md b/articles/data-factory/concepts-data-flow-performance-transformations.md
@@ -52,6 +52,10 @@ If your data is not evenly partitioned after a transformation, you can use the [
 > [!TIP]
 > If you repartition your data, but have downstream transformations that reshuffle your data, use hash partitioning on a column used as a join key.
 
+> [!NOTE]
+> Transformations inside your data flow (with the exception of the Sink transformation) do not modify the file and folder partitioning of data at rest. Partitioning in each transformation repartitions data inside the data frames of the temporary serverless Spark cluster that ADF manages for each of your data flow executions.
+
+
 ## Next steps
 
 - [Data flow performance overview](concepts-data-flow-performance.md)
diff --git a/articles/data-factory/data-flow-assert.md b/articles/data-factory/data-flow-assert.md
@@ -6,7 +6,7 @@ ms.author: makromer
 ms.service: data-factory
 ms.subservice: data-flows
 ms.topic: conceptual
-ms.date: 06/09/2022
+ms.date: 06/23/2022
 ---
 
 # Assert transformation in mapping data flow
@@ -61,7 +61,7 @@ By default, the assert transformation will include NULLs in row assertion evalua
 
 ## Direct assert row failures
 
-When an assertion fails, you can optionally direct those error rows to a file in Azure by using the "Errors" tab on the sink transformation.
+When an assertion fails, you can optionally direct those error rows to a file in Azure by using the "Errors" tab on the sink transformation. You will also have an option on the sink transformation to not output rows with assertion failures at all by ignoring error rows.
 
 ## Examples
 
diff --git a/articles/data-factory/data-flow-sink.md b/articles/data-factory/data-flow-sink.md
@@ -9,7 +9,7 @@ ms.service: data-factory
 ms.subservice: data-flows
 ms.topic: conceptual
 ms.custom: seo-lt-2019
-ms.date: 03/25/2022
+ms.date: 06/23/2022
 ---
 
 # Sink transformation in mapping data flow
@@ -137,7 +137,7 @@ Below is a video tutorial on how to use database error row handling automaticall
 
 > [!VIDEO https://www.microsoft.com/en-us/videoplayer/embed/RE4IWne]
 
-For assert failure rows, you can use the Assert transformation upstream in your data flow and then redirect failed assertions to an output file here in the sink errors tab.
+For assert failure rows, you can use the Assert transformation upstream in your data flow and then redirect failed assertions to an output file here in the sink errors tab. You also have an option here to ignore rows with assertion failures and not output those rows at all to the sink destination data store.
 
 :::image type="content" source="media/data-flow/assert-errors.png" alt-text="Assert failure rows":::
 
diff --git a/articles/data-factory/format-delta.md b/articles/data-factory/format-delta.md
@@ -141,8 +141,10 @@ In Settings tab, you will find three more options to optimize delta sink transfo
 
 * When **Auto compact** is enabled, after an individual write, transformation  checks if files can further be compacted, and runs a quick OPTIMIZE job (with 128 MB file sizes instead of 1GB) to further compact files for partitions that have the most number of small files. Auto compaction helps in coalescing a large number of small files into a smaller number of large files. Auto compaction only kicks in when there are at least 50 files. Once a compaction operation is performed, it creates a new version of the table, and writes a new file containing the data of several previous files in a compact compressed form. 
 
-* When **Optimize write** is enabled, sink transformation dynamically optimizes partition sizes based on the actual data by attempting to write out 128 MB files for each table partition. This is an approximate size and can vary depending on dataset characteristics. Optimized writes  improve the overall efficiency of the *writes and  subsequent reads*. It organizes partitions such that the performance of subsequent reads will improve. 
+* When **Optimize write** is enabled, sink transformation dynamically optimizes partition sizes based on the actual data by attempting to write out 128 MB files for each table partition. This is an approximate size and can vary depending on dataset characteristics. Optimized writes improve the overall efficiency of the *writes and  subsequent reads*. It organizes partitions such that the performance of subsequent reads will improve
 
+> [!TIP]
+> The optimized write process will slow down your overall ETL job because the Sink will issue the Spark Delta Lake Optimize command after your data is processed. It is recommended to use Optimized Write sparingly. For example, if you have an hourly data pipeline, execute a data flow with Optimized Write daily.
 
 ### Known limitations