Skip to content

Commit c1a3a52

Browse files
authored
Update write-to-delta-lake.md
1. Add more information about checkpoints memory issue. 2. Add CheckpointV2 and multi part checkpoint limitation. 3. Remove en-us from the links
1 parent 14087d0 commit c1a3a52

File tree

1 file changed

+6
-5
lines changed

1 file changed

+6
-5
lines changed

articles/stream-analytics/write-to-delta-lake.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ At the failure of schema conversion, the job behavior will follow the [output da
8484

8585
### Delta Log checkpoints
8686

87-
The Stream Analytics job will create Delta Log checkpoints periodically.
87+
The Stream Analytics job will create [Delta Log checkpoints](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#checkpoints-1) periodically in the V1 format. Delta Log checkpoints are snapshots of the Delta Table and will typically contain the name of the data file generated by the Stream Analytics job. If the amount of data files is large, then this will lead to large checkpoints which can cause memory issues in the Stream Analytics Job.
8888

8989
## Limitations
9090

@@ -101,11 +101,12 @@ The Stream Analytics job will create Delta Log checkpoints periodically.
101101
- The exceptions here are the [changeDataFeed and appendOnly Writer Features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#valid-feature-names-in-table-features).
102102
- When a Stream Analytics job writes a batch of data to a Delta Lake, it can generate multiple [Add File Action](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#add-file-and-remove-file). When there are too many Add File Actions generated for a single batch, a Stream Analytics Job can be stuck.
103103
- The number of Add File Actions generated are determined by a number of factors:
104-
- Size of the batch. This is determined by the data volume and the batching parameters [Minimum Rows and Maximum Time](https://learn.microsoft.com/en-us/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output#output-configuration)
105-
- Cardinality of the [Partition Column values](https://learn.microsoft.com/en-us/azure/stream-analytics/write-to-delta-lake#delta-lake-configuration) of the batch.
104+
- Size of the batch. This is determined by the data volume and the batching parameters [Minimum Rows and Maximum Time](https://learn.microsoft.com/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output#output-configuration)
105+
- Cardinality of the [Partition Column values](https://learn.microsoft.com/azure/stream-analytics/write-to-delta-lake#delta-lake-configuration) of the batch.
106106
- To reduce the number of Add File Actions generated for a batch the following steps can be taken:
107-
- Reduce the batching configurations [Minimum Rows and Maximum Time](https://learn.microsoft.com/en-us/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output#output-configuration)
108-
- Reduce the cardinality of the [Partition Column values](https://learn.microsoft.com/en-us/azure/stream-analytics/write-to-delta-lake#delta-lake-configuration) by tweaking the input data or choosing a different partition column
107+
- Reduce the batching configurations [Minimum Rows and Maximum Time](https://learn.microsoft.com/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output#output-configuration)
108+
- Reduce the cardinality of the [Partition Column values](https://learn.microsoft.com/azure/stream-analytics/write-to-delta-lake#delta-lake-configuration) by tweaking the input data or choosing a different partition column
109+
- Stream Analytics jobs can only read and write single part V1 Checkpoints. Multi-part checkpoints and the Checkpoint V2 format are not supported.
109110

110111
## Next steps
111112

0 commit comments

Comments
 (0)