Skip to content

Commit e7059d8

Browse files
authored
Update blob-storage-azure-data-lake-gen2-output.md
1 parent e22c5da commit e7059d8

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

articles/stream-analytics/blob-storage-azure-data-lake-gen2-output.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Azure Data Lake Storage Gen2 makes Azure Storage the foundation for building ent
1515
Blob Storage offers a cost-effective and scalable solution for storing large amounts of unstructured data in the cloud. For an introduction on Blob Storage and its use, see [Upload, download, and list blobs with the Azure portal](../storage/blobs/storage-quickstart-blobs-portal.md).
1616

1717
>[!NOTE]
18-
> For information on the behaviors specific to the AVRO and Parquet formats, see the related sections in the [overview](stream-analytics-define-outputs.md).
18+
> For information on the behaviors specific to the Avro and Parquet formats, see the related sections in the [overview](stream-analytics-define-outputs.md).
1919
2020
## Output configuration
2121

@@ -30,8 +30,8 @@ The following table lists the property names and their descriptions for creating
3030
| Event serialization format | The serialization format for output data. JSON, CSV, Avro, and Parquet are supported. Delta Lake is listed as an option here. The data is in Parquet format if Delta Lake is selected. Learn more about [Delta Lake](write-to-delta-lake.md). |
3131
| Delta path name | Required when the event serialization format is Delta Lake. The path that's used to write the Delta Lake table within the specified container. It includes the table name. For more information and examples, see [Write to a Delta Lake table](write-to-delta-lake.md). |
3232
|Write mode | Write mode controls the way that Azure Stream Analytics writes to an output file. Exactly-once delivery only happens when Write mode is Once. For more information, see the next section. |
33-
| Partition column | Optional. The {field} name from your output data to partition. Only one partition column is supported. |
34-
| Path pattern | Required when the event serialization format is Delta Lake. The file path pattern that's used to write your blobs within the specified container. <br /><br /> In the path pattern, you can choose to use one or more instances of the date and time variables to specify the frequency at which blobs are written: {date}, {time}. <br /><br />If your Write mode is Once, you need to use both {date} and {time}. <br /><br />You can use custom blob partitioning to specify one custom {field} name from your event data to partition blobs. The field name is alphanumeric and can include spaces, hyphens, and underscores. Restrictions on custom fields include the following ones: <ul><li>No dynamic custom {field} name is allowed if your Write mode is Once. </li><li>Field names aren't case sensitive. For example, the service can't differentiate between column `ID` and column `id`.</li><li>Nested fields aren't permitted. Instead, use an alias in the job query to "flatten" the field.</li><li>Expressions can't be used as a field name.</li></ul> <br />This feature enables the use of custom date/time format specifier configurations in the path. Custom date/time formats must be specified one at a time and enclosed by the {datetime:\<specifier>} keyword. Allowable inputs for `\<specifier>` are `yyyy`, `MM`, `M`, `dd`, `d`, `HH`, `H`, `mm`, `m`, `ss`, or `s`. The {datetime:\<specifier>} keyword can be used multiple times in the path to form custom date/time configurations. <br /><br />Examples: <ul><li>Example 1: `cluster1/logs/{date}/{time}`</li><li>Example 2: `cluster1/logs/{date}`</li><li>Example 3: `cluster1/{client_id}/{date}/{time}`</li><li>Example 4: `cluster1/{datetime:ss}/{myField}` where the query is `SELECT data.myField AS myField FROM Input;`</li><li>Example 5: `cluster1/year={datetime:yyyy}/month={datetime:MM}/day={datetime:dd}`</ul><br />The time stamp of the created folder structure follows UTC and not local time. [System.Timestamp](./stream-analytics-time-handling.md#choose-the-best-starting-time) is the time used for all time-based partitioning.<br /><br />File naming uses the following convention: <br /><br />`{Path Prefix Pattern}/schemaHashcode_Guid_Number.extension`<br /><br /> Here, `Guid` represents the unique identifier assigned to an internal writer that's created to write to a blob file. The number represents the index of the blob block. <br /><br /> Example output files:<ul><li>`Myoutput/20170901/00/45434_gguid_1.csv`</li> <li>`Myoutput/20170901/01/45434_gguid_1.csv`</li></ul> <br />For more information about this feature, see [Azure Stream Analytics custom blob output partitioning](stream-analytics-custom-path-patterns-blob-storage-output.md). |
33+
| Partition column | Optional. The `{field}` name from your output data to partition. Only one partition column is supported. |
34+
| Path pattern | Required when the event serialization format is Delta Lake. The file path pattern that's used to write your blobs within the specified container. <br /><br /> In the path pattern, you can choose to use one or more instances of the date and time variables to specify the frequency at which blobs are written: `{date}`, `{time}`. <br /><br />If your Write mode is Once, you need to use both `{date}` and `{time}`. <br /><br />You can use custom blob partitioning to specify one custom `{field}` name from your event data to partition blobs. The field name is alphanumeric and can include spaces, hyphens, and underscores. Restrictions on custom fields include the following ones: <ul><li>No dynamic custom `{field}` name is allowed if your Write mode is Once. </li><li>Field names aren't case sensitive. For example, the service can't differentiate between column `ID` and column `id`.</li><li>Nested fields aren't permitted. Instead, use an alias in the job query to "flatten" the field.</li><li>Expressions can't be used as a field name.</li></ul> <br />This feature enables the use of custom date/time format specifier configurations in the path. Custom date/time formats must be specified one at a time and enclosed by the `{datetime:\<specifier>}` keyword. Allowable inputs for `\<specifier>` are `yyyy`, `MM`, `M`, `dd`, `d`, `HH`, `H`, `mm`, `m`, `ss`, or `s`. The `{datetime:\<specifier>}` keyword can be used multiple times in the path to form custom date/time configurations. <br /><br />Examples: <ul><li>Example 1: `cluster1/logs/{date}/{time}`</li><li>Example 2: `cluster1/logs/{date}`</li><li>Example 3: `cluster1/{client_id}/{date}/{time}`</li><li>Example 4: `cluster1/{datetime:ss}/{myField}` where the query is `SELECT data.myField AS myField FROM Input;`</li><li>Example 5: `cluster1/year={datetime:yyyy}/month={datetime:MM}/day={datetime:dd}`</ul><br />The time stamp of the created folder structure follows UTC and not local time. [System.Timestamp](./stream-analytics-time-handling.md#choose-the-best-starting-time) is the time used for all time-based partitioning.<br /><br />File naming uses the following convention: <br /><br />`{Path Prefix Pattern}/schemaHashcode_Guid_Number.extension`<br /><br /> Here, `Guid` represents the unique identifier assigned to an internal writer that's created to write to a blob file. The number represents the index of the blob block. <br /><br /> Example output files:<ul><li>`Myoutput/20170901/00/45434_gguid_1.csv`</li> <li>`Myoutput/20170901/01/45434_gguid_1.csv`</li></ul> <br />For more information about this feature, see [Azure Stream Analytics custom blob output partitioning](stream-analytics-custom-path-patterns-blob-storage-output.md). |
3535
| Date format | Required when the event serialization format is Delta Lake. If the date token is used in the prefix path, you can select the date format in which your files are organized. An example is `YYYY/MM/DD`. |
3636
| Time format | Required when the event serialization format is Delta Lake. If the time token is used in the prefix path, specify the time format in which your files are organized.|
3737
|Minimum rows |The number of minimum rows per batch. For Parquet, every batch creates a new file. The current default value is 2,000 rows and the allowed maximum is 10,000 rows.|
@@ -58,13 +58,13 @@ There's no Write mode option for Delta Lake. However, Delta Lake output also pro
5858
To receive exactly-once delivery for your Blob Storage or Data Lake Storage Gen2 account, you need to configure the following settings:
5959

6060
* Select **Once after all results of time partition is available** for your **Write Mode**.
61-
* Provide **Path Pattern** with both {date} and {time} specified.
61+
* Provide **Path Pattern** with both `{date}` and `{time}` specified.
6262
* Specify **date format** and **time format**.
6363

6464
### Limitations
6565

6666
* [Substream](/stream-analytics-query/timestamp-by-azure-stream-analytics) isn't supported.
67-
* Path Pattern becomes a required property and must contain both {date} and {time}. No dynamic custom {field} name is allowed. Learn more about [custom path pattern](stream-analytics-custom-path-patterns-blob-storage-output.md).
67+
* Path Pattern becomes a required property and must contain both `{date}` and `{time}`. No dynamic custom `{field}` name is allowed. Learn more about [custom path pattern](stream-analytics-custom-path-patterns-blob-storage-output.md).
6868
* If the job is started at a custom time before or after the last output time, there's a risk of the file being overwritten. For example, when **time format** is set to **HH**, the file is generated every hour. If you stop the job at 8:15 AM and restart the job at 8:30 AM, the file generated between 8 AM to 9 AM only covers data from 8:30 AM to 9 AM. The data from 8 AM to 8:15 AM gets lost as it's overwritten.
6969

7070
## Blob output files
@@ -82,7 +82,7 @@ When you're using Blob Storage as output, a new file is created in the blob in t
8282

8383
## Partitioning
8484

85-
For partition key, use {date} and {time} tokens from your event fields in the path pattern. Choose the date format, such as `YYYY/MM/DD`, `DD/MM/YYYY`, or `MM-DD-YYYY`. `HH` is used for the time format. Blob output can be partitioned by a single custom event attribute {fieldname} or {datetime:\<specifier>}. The number of output writers follows the input partitioning for [fully parallelizable queries](stream-analytics-scale-jobs.md).
85+
For partition key, use `{date}` and `{time}` tokens from your event fields in the path pattern. Choose the date format, such as `YYYY/MM/DD`, `DD/MM/YYYY`, or `MM-DD-YYYY`. `HH` is used for the time format. Blob output can be partitioned by a single custom event attribute `{fieldname}` or `{datetime:\<specifier>}`. The number of output writers follows the input partitioning for [fully parallelizable queries](stream-analytics-scale-jobs.md).
8686

8787
## Output batch size
8888

0 commit comments

Comments
 (0)