You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/stream-analytics/write-to-delta-lake.md
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -84,7 +84,7 @@ At the failure of schema conversion, the job behavior will follow the [output da
84
84
85
85
### Delta Log checkpoints
86
86
87
-
The Stream Analytics job will create Delta Log checkpoints periodically.
87
+
The Stream Analytics job will create [Delta Log checkpoints](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#checkpoints-1) periodically in the V1 format. Delta Log checkpoints are snapshots of the Delta Table and will typically contain the name of the data file generated by the Stream Analytics job. If the amount of data files is large, then this will lead to large checkpoints which can cause memory issues in the Stream Analytics Job.
88
88
89
89
## Limitations
90
90
@@ -101,11 +101,12 @@ The Stream Analytics job will create Delta Log checkpoints periodically.
101
101
- The exceptions here are the [changeDataFeed and appendOnly Writer Features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#valid-feature-names-in-table-features).
102
102
- When a Stream Analytics job writes a batch of data to a Delta Lake, it can generate multiple [Add File Action](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#add-file-and-remove-file). When there are too many Add File Actions generated for a single batch, a Stream Analytics Job can be stuck.
103
103
- The number of Add File Actions generated are determined by a number of factors:
104
-
- Size of the batch. This is determined by the data volume and the batching parameters [Minimum Rows and Maximum Time](https://learn.microsoft.com/en-us/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output#output-configuration)
105
-
- Cardinality of the [Partition Column values](https://learn.microsoft.com/en-us/azure/stream-analytics/write-to-delta-lake#delta-lake-configuration) of the batch.
104
+
- Size of the batch. This is determined by the data volume and the batching parameters [Minimum Rows and Maximum Time](https://learn.microsoft.com/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output#output-configuration)
105
+
- Cardinality of the [Partition Column values](https://learn.microsoft.com/azure/stream-analytics/write-to-delta-lake#delta-lake-configuration) of the batch.
106
106
- To reduce the number of Add File Actions generated for a batch the following steps can be taken:
107
-
- Reduce the batching configurations [Minimum Rows and Maximum Time](https://learn.microsoft.com/en-us/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output#output-configuration)
108
-
- Reduce the cardinality of the [Partition Column values](https://learn.microsoft.com/en-us/azure/stream-analytics/write-to-delta-lake#delta-lake-configuration) by tweaking the input data or choosing a different partition column
107
+
- Reduce the batching configurations [Minimum Rows and Maximum Time](https://learn.microsoft.com/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output#output-configuration)
108
+
- Reduce the cardinality of the [Partition Column values](https://learn.microsoft.com/azure/stream-analytics/write-to-delta-lake#delta-lake-configuration) by tweaking the input data or choosing a different partition column
109
+
- Stream Analytics jobs can only read and write single part V1 Checkpoints. Multi-part checkpoints and the Checkpoint V2 format are not supported.
0 commit comments