Skip to content

Commit 661b3df

Browse files
authored
Update write-to-delta-lake.md
1 parent f8888aa commit 661b3df

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

articles/stream-analytics/write-to-delta-lake.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,19 +12,19 @@ ms.custom: build-2023
1212

1313
# Azure Stream Analytics: Write to a Delta Lake table
1414

15-
Delta Lake is an open format that brings reliability, quality, and performance to data lakes. Azure Stream Analytics allows you to directly write streaming data to your Delta Lake tables without writing a single line of code.
15+
Delta Lake is an open format that brings reliability, quality, and performance to data lakes. You can use Azure Stream Analytics to directly write streaming data to your Delta Lake tables without writing a single line of code.
1616

17-
A Stream Analytics job can be configured to write through a native Delta Lake output connector, either to a new or a precreated Delta table in an Azure Data Lake Storage Gen2 account. This connector is optimized for high-speed ingestion to Delta tables in Append mode. It also provides exactly-once semantics, which guarantees that no data is lost or duplicated. Ingesting real-time data streams from Azure Event Hubs into Delta tables allows you to perform ad-hoc interactive or batch analytics.
17+
A Stream Analytics job can be configured to write through a native Delta Lake output connector, either to a new or a precreated Delta table in an Azure Data Lake Storage Gen2 account. This connector is optimized for high-speed ingestion to Delta tables in Append mode. It also provides exactly-once semantics, which guarantees that no data is lost or duplicated. Ingesting real-time data streams from Azure Event Hubs into Delta tables allows you to perform ad hoc interactive or batch analytics.
1818

1919
## Delta Lake configuration
2020

2121
To write data in Delta Lake, you need to connect to a Data Lake Storage Gen2 account. The following table lists the properties related to Delta Lake configuration.
2222

2323
|Property name |Description |
2424
|----------|-----------|
25-
|Event serialization format|Serialization format for output data. JSON, CSV, AVRO, and Parquet are supported. Delta Lake is listed as an option here. The data is in Parquet format if Delta Lake is selected. |
25+
|Event serialization format|Serialization format for output data. JSON, CSV, Avro, and Parquet are supported. Delta Lake is listed as an option here. The data is in Parquet format if Delta Lake is selected. |
2626
|Delta path name| The path that's used to write your Delta Lake table within the specified container. It includes the table name. More information is in the next section. |
27-
|Partition column |Optional. The {field} name from your output data to partition. Only one partition column is supported. The column's value must be of `string` type. |
27+
|Partition column |Optional. The `{field}` name from your output data to partition. Only one partition column is supported. The column's value must be of `string` type. |
2828

2929
To see the full list of Data Lake Storage Gen2 configuration, see [Azure Data Lake Storage Gen2 overview](blob-storage-azure-data-lake-gen2-output.md).
3030

@@ -39,7 +39,7 @@ The segment name is alphanumeric and can include spaces, hyphens, and underscore
3939
Restrictions on the Delta path name include:
4040

4141
- Field names aren't case sensitive. For example, the service can't differentiate between column `ID` and `id`.
42-
- No dynamic {field} name is allowed. For example, {ID} is treated as text {ID}.
42+
- No dynamic `{field}` name is allowed. For example, `{ID}` is treated as text {ID}.
4343
- The number of path segments that comprise the name can't exceed 254.
4444

4545
### Examples
@@ -100,11 +100,11 @@ The Stream Analytics job creates [Delta log checkpoints](https://github.com/delt
100100
- Writing to existing tables of Writer Version 7 or above with writer features fail.
101101
- Example: Writing to existing tables with [Deletion Vectors](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#deletion-vectors) enabled fail.
102102
- The exceptions here are the [changeDataFeed and appendOnly Writer Features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#valid-feature-names-in-table-features).
103-
- When a Stream Analytics job writes a batch of data to a Delta Lake, it can generate multiple [Add File Actions](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#add-file-and-remove-file). When there are too many Add File Actions generated for a single batch, a Stream Analytics job can be stuck.
104-
- The number of Add File Actions generated are determined by many factors:
103+
- When a Stream Analytics job writes a batch of data to a Delta Lake, it can generate multiple [Add File actions](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#add-file-and-remove-file). When there are too many Add File actions generated for a single batch, a Stream Analytics job can be stuck.
104+
- The number of Add File actions generated are determined by many factors:
105105
- Size of the batch. It's determined by the data volume and the batching parameters [Minimum Rows and Maximum Time](blob-storage-azure-data-lake-gen2-output.md#output-configuration).
106106
- Cardinality of the [partition column values](#delta-lake-configuration) of the batch.
107-
- To reduce the number of Add File Actions generated for a batch:
107+
- To reduce the number of Add File actions generated for a batch:
108108
- Reduce the batching configurations [Minimum Rows and Maximum Time](blob-storage-azure-data-lake-gen2-output.md#output-configuration).
109109
- Reduce the cardinality of the [partition column values](#delta-lake-configuration) by tweaking the input data or choosing a different partition column.
110110
- Stream Analytics jobs can only read and write single part V1 checkpoints. Multipart checkpoints and the checkpoint V2 format aren't supported.

0 commit comments

Comments
 (0)