You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/stream-analytics/write-to-delta-lake.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,19 +12,19 @@ ms.custom: build-2023
12
12
13
13
# Azure Stream Analytics: Write to a Delta Lake table
14
14
15
-
Delta Lake is an open format that brings reliability, quality, and performance to data lakes. Azure Stream Analytics allows you to directly write streaming data to your Delta Lake tables without writing a single line of code.
15
+
Delta Lake is an open format that brings reliability, quality, and performance to data lakes. You can use Azure Stream Analytics to directly write streaming data to your Delta Lake tables without writing a single line of code.
16
16
17
-
A Stream Analytics job can be configured to write through a native Delta Lake output connector, either to a new or a precreated Delta table in an Azure Data Lake Storage Gen2 account. This connector is optimized for high-speed ingestion to Delta tables in Append mode. It also provides exactly-once semantics, which guarantees that no data is lost or duplicated. Ingesting real-time data streams from Azure Event Hubs into Delta tables allows you to perform ad-hoc interactive or batch analytics.
17
+
A Stream Analytics job can be configured to write through a native Delta Lake output connector, either to a new or a precreated Delta table in an Azure Data Lake Storage Gen2 account. This connector is optimized for high-speed ingestion to Delta tables in Append mode. It also provides exactly-once semantics, which guarantees that no data is lost or duplicated. Ingesting real-time data streams from Azure Event Hubs into Delta tables allows you to perform adhoc interactive or batch analytics.
18
18
19
19
## Delta Lake configuration
20
20
21
21
To write data in Delta Lake, you need to connect to a Data Lake Storage Gen2 account. The following table lists the properties related to Delta Lake configuration.
22
22
23
23
|Property name |Description |
24
24
|----------|-----------|
25
-
|Event serialization format|Serialization format for output data. JSON, CSV, AVRO, and Parquet are supported. Delta Lake is listed as an option here. The data is in Parquet format if Delta Lake is selected. |
25
+
|Event serialization format|Serialization format for output data. JSON, CSV, Avro, and Parquet are supported. Delta Lake is listed as an option here. The data is in Parquet format if Delta Lake is selected. |
26
26
|Delta path name| The path that's used to write your Delta Lake table within the specified container. It includes the table name. More information is in the next section. |
27
-
|Partition column |Optional. The {field} name from your output data to partition. Only one partition column is supported. The column's value must be of `string` type. |
27
+
|Partition column |Optional. The `{field}` name from your output data to partition. Only one partition column is supported. The column's value must be of `string` type. |
28
28
29
29
To see the full list of Data Lake Storage Gen2 configuration, see [Azure Data Lake Storage Gen2 overview](blob-storage-azure-data-lake-gen2-output.md).
30
30
@@ -39,7 +39,7 @@ The segment name is alphanumeric and can include spaces, hyphens, and underscore
39
39
Restrictions on the Delta path name include:
40
40
41
41
- Field names aren't case sensitive. For example, the service can't differentiate between column `ID` and `id`.
42
-
- No dynamic {field} name is allowed. For example, {ID} is treated as text {ID}.
42
+
- No dynamic `{field}` name is allowed. For example, `{ID}` is treated as text {ID}.
43
43
- The number of path segments that comprise the name can't exceed 254.
- Writing to existing tables of Writer Version 7 or above with writer features fail.
101
101
- Example: Writing to existing tables with [Deletion Vectors](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#deletion-vectors) enabled fail.
102
102
- The exceptions here are the [changeDataFeed and appendOnly Writer Features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#valid-feature-names-in-table-features).
103
-
- When a Stream Analytics job writes a batch of data to a Delta Lake, it can generate multiple [Add File Actions](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#add-file-and-remove-file). When there are too many Add File Actions generated for a single batch, a Stream Analytics job can be stuck.
104
-
- The number of Add File Actions generated are determined by many factors:
103
+
- When a Stream Analytics job writes a batch of data to a Delta Lake, it can generate multiple [Add File actions](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#add-file-and-remove-file). When there are too many Add File actions generated for a single batch, a Stream Analytics job can be stuck.
104
+
- The number of Add File actions generated are determined by many factors:
105
105
- Size of the batch. It's determined by the data volume and the batching parameters [Minimum Rows and Maximum Time](blob-storage-azure-data-lake-gen2-output.md#output-configuration).
106
106
- Cardinality of the [partition column values](#delta-lake-configuration) of the batch.
107
-
- To reduce the number of Add File Actions generated for a batch:
107
+
- To reduce the number of Add File actions generated for a batch:
108
108
- Reduce the batching configurations [Minimum Rows and Maximum Time](blob-storage-azure-data-lake-gen2-output.md#output-configuration).
109
109
- Reduce the cardinality of the [partition column values](#delta-lake-configuration) by tweaking the input data or choosing a different partition column.
110
110
- Stream Analytics jobs can only read and write single part V1 checkpoints. Multipart checkpoints and the checkpoint V2 format aren't supported.
0 commit comments