You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/stream-analytics/write-to-delta-lake.md
+9-2Lines changed: 9 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -84,8 +84,7 @@ At the failure of schema conversion, the job behavior will follow the [output da
84
84
85
85
### Delta Log checkpoints
86
86
87
-
88
-
The Stream Analytics job will create Delta Log checkpoints periodically.
87
+
The Stream Analytics job will create [Delta Log checkpoints](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#checkpoints-1) periodically in the V1 format. Delta Log checkpoints are snapshots of the Delta Table and will typically contain the name of the data file generated by the Stream Analytics job. If the amount of data files is large, then this will lead to large checkpoints which can cause memory issues in the Stream Analytics Job.
89
88
90
89
## Limitations
91
90
@@ -100,6 +99,14 @@ The Stream Analytics job will create Delta Log checkpoints periodically.
100
99
- Writing to existing tables of Writer Version 7 or above with writer features will fail.
101
100
- Example: Writing to existing tables with [Deletion Vectors](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#deletion-vectors) enabled will fail.
102
101
- The exceptions here are the [changeDataFeed and appendOnly Writer Features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#valid-feature-names-in-table-features).
102
+
- When a Stream Analytics job writes a batch of data to a Delta Lake, it can generate multiple [Add File Action](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#add-file-and-remove-file). When there are too many Add File Actions generated for a single batch, a Stream Analytics Job can be stuck.
103
+
- The number of Add File Actions generated are determined by a number of factors:
104
+
- Size of the batch. This is determined by the data volume and the batching parameters [Minimum Rows and Maximum Time](https://learn.microsoft.com/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output#output-configuration)
105
+
- Cardinality of the [Partition Column values](https://learn.microsoft.com/azure/stream-analytics/write-to-delta-lake#delta-lake-configuration) of the batch.
106
+
- To reduce the number of Add File Actions generated for a batch the following steps can be taken:
107
+
- Reduce the batching configurations [Minimum Rows and Maximum Time](https://learn.microsoft.com/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output#output-configuration)
108
+
- Reduce the cardinality of the [Partition Column values](https://learn.microsoft.com/azure/stream-analytics/write-to-delta-lake#delta-lake-configuration) by tweaking the input data or choosing a different partition column
109
+
- Stream Analytics jobs can only read and write single part V1 Checkpoints. Multi-part checkpoints and the Checkpoint V2 format are not supported.
0 commit comments