Skip to content

Commit 14087d0

Browse files
authored
Add AddFileAction Size Limitation
1 parent 12549f7 commit 14087d0

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

articles/stream-analytics/write-to-delta-lake.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,6 @@ At the failure of schema conversion, the job behavior will follow the [output da
8484

8585
### Delta Log checkpoints
8686

87-
8887
The Stream Analytics job will create Delta Log checkpoints periodically.
8988

9089
## Limitations
@@ -100,6 +99,13 @@ The Stream Analytics job will create Delta Log checkpoints periodically.
10099
- Writing to existing tables of Writer Version 7 or above with writer features will fail.
101100
- Example: Writing to existing tables with [Deletion Vectors](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#deletion-vectors) enabled will fail.
102101
- The exceptions here are the [changeDataFeed and appendOnly Writer Features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#valid-feature-names-in-table-features).
102+
- When a Stream Analytics job writes a batch of data to a Delta Lake, it can generate multiple [Add File Action](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#add-file-and-remove-file). When there are too many Add File Actions generated for a single batch, a Stream Analytics Job can be stuck.
103+
- The number of Add File Actions generated are determined by a number of factors:
104+
- Size of the batch. This is determined by the data volume and the batching parameters [Minimum Rows and Maximum Time](https://learn.microsoft.com/en-us/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output#output-configuration)
105+
- Cardinality of the [Partition Column values](https://learn.microsoft.com/en-us/azure/stream-analytics/write-to-delta-lake#delta-lake-configuration) of the batch.
106+
- To reduce the number of Add File Actions generated for a batch the following steps can be taken:
107+
- Reduce the batching configurations [Minimum Rows and Maximum Time](https://learn.microsoft.com/en-us/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output#output-configuration)
108+
- Reduce the cardinality of the [Partition Column values](https://learn.microsoft.com/en-us/azure/stream-analytics/write-to-delta-lake#delta-lake-configuration) by tweaking the input data or choosing a different partition column
103109

104110
## Next steps
105111

0 commit comments

Comments
 (0)