Skip to content

Commit 23cdc23

Browse files
authored
Merge pull request #203766 from kromerm/deltaupdates
Deltaupdates
2 parents 0145938 + cb53239 commit 23cdc23

File tree

5 files changed

+12
-6
lines changed

5 files changed

+12
-6
lines changed

articles/data-factory/TOC.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -730,7 +730,7 @@ items:
730730
href: data-flow-alter-row.md
731731
displayName: upsert, update, insert, delete
732732
- name: Assert
733-
href: data-flow-assert.md
733+
href: data-flow-assert.md
734734
- name: Conditional split
735735
href: data-flow-conditional-split.md
736736
displayName: split

articles/data-factory/concepts-data-flow-performance-transformations.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,10 @@ If your data is not evenly partitioned after a transformation, you can use the [
5252
> [!TIP]
5353
> If you repartition your data, but have downstream transformations that reshuffle your data, use hash partitioning on a column used as a join key.
5454
55+
> [!NOTE]
56+
> Transformations inside your data flow (with the exception of the Sink transformation) do not modify the file and folder partitioning of data at rest. Partitioning in each transformation repartitions data inside the data frames of the temporary serverless Spark cluster that ADF manages for each of your data flow executions.
57+
58+
5559
## Next steps
5660

5761
- [Data flow performance overview](concepts-data-flow-performance.md)

articles/data-factory/data-flow-assert.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: makromer
66
ms.service: data-factory
77
ms.subservice: data-flows
88
ms.topic: conceptual
9-
ms.date: 06/09/2022
9+
ms.date: 06/23/2022
1010
---
1111

1212
# Assert transformation in mapping data flow
@@ -61,7 +61,7 @@ By default, the assert transformation will include NULLs in row assertion evalua
6161

6262
## Direct assert row failures
6363

64-
When an assertion fails, you can optionally direct those error rows to a file in Azure by using the "Errors" tab on the sink transformation.
64+
When an assertion fails, you can optionally direct those error rows to a file in Azure by using the "Errors" tab on the sink transformation. You will also have an option on the sink transformation to not output rows with assertion failures at all by ignoring error rows.
6565

6666
## Examples
6767

articles/data-factory/data-flow-sink.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.service: data-factory
99
ms.subservice: data-flows
1010
ms.topic: conceptual
1111
ms.custom: seo-lt-2019
12-
ms.date: 03/25/2022
12+
ms.date: 06/23/2022
1313
---
1414

1515
# Sink transformation in mapping data flow
@@ -137,7 +137,7 @@ Below is a video tutorial on how to use database error row handling automaticall
137137

138138
> [!VIDEO https://www.microsoft.com/en-us/videoplayer/embed/RE4IWne]
139139
140-
For assert failure rows, you can use the Assert transformation upstream in your data flow and then redirect failed assertions to an output file here in the sink errors tab.
140+
For assert failure rows, you can use the Assert transformation upstream in your data flow and then redirect failed assertions to an output file here in the sink errors tab. You also have an option here to ignore rows with assertion failures and not output those rows at all to the sink destination data store.
141141

142142
:::image type="content" source="media/data-flow/assert-errors.png" alt-text="Assert failure rows":::
143143

articles/data-factory/format-delta.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,8 +141,10 @@ In Settings tab, you will find three more options to optimize delta sink transfo
141141

142142
* When **Auto compact** is enabled, after an individual write, transformation checks if files can further be compacted, and runs a quick OPTIMIZE job (with 128 MB file sizes instead of 1GB) to further compact files for partitions that have the most number of small files. Auto compaction helps in coalescing a large number of small files into a smaller number of large files. Auto compaction only kicks in when there are at least 50 files. Once a compaction operation is performed, it creates a new version of the table, and writes a new file containing the data of several previous files in a compact compressed form.
143143

144-
* When **Optimize write** is enabled, sink transformation dynamically optimizes partition sizes based on the actual data by attempting to write out 128 MB files for each table partition. This is an approximate size and can vary depending on dataset characteristics. Optimized writes improve the overall efficiency of the *writes and subsequent reads*. It organizes partitions such that the performance of subsequent reads will improve.
144+
* When **Optimize write** is enabled, sink transformation dynamically optimizes partition sizes based on the actual data by attempting to write out 128 MB files for each table partition. This is an approximate size and can vary depending on dataset characteristics. Optimized writes improve the overall efficiency of the *writes and subsequent reads*. It organizes partitions such that the performance of subsequent reads will improve
145145

146+
> [!TIP]
147+
> The optimized write process will slow down your overall ETL job because the Sink will issue the Spark Delta Lake Optimize command after your data is processed. It is recommended to use Optimized Write sparingly. For example, if you have an hourly data pipeline, execute a data flow with Optimized Write daily.
146148
147149
### Known limitations
148150

0 commit comments

Comments
 (0)