You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/stream-analytics/blob-storage-azure-data-lake-gen2-output.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ Azure Data Lake Storage Gen2 makes Azure Storage the foundation for building ent
15
15
Blob Storage offers a cost-effective and scalable solution for storing large amounts of unstructured data in the cloud. For an introduction on Blob Storage and its use, see [Upload, download, and list blobs with the Azure portal](../storage/blobs/storage-quickstart-blobs-portal.md).
16
16
17
17
>[!NOTE]
18
-
> For information on the behaviors specific to the AVRO and Parquet formats, see the related sections in the [overview](stream-analytics-define-outputs.md).
18
+
> For information on the behaviors specific to the Avro and Parquet formats, see the related sections in the [overview](stream-analytics-define-outputs.md).
19
19
20
20
## Output configuration
21
21
@@ -30,8 +30,8 @@ The following table lists the property names and their descriptions for creating
30
30
| Event serialization format | The serialization format for output data. JSON, CSV, Avro, and Parquet are supported. Delta Lake is listed as an option here. The data is in Parquet format if Delta Lake is selected. Learn more about [Delta Lake](write-to-delta-lake.md). |
31
31
| Delta path name | Required when the event serialization format is Delta Lake. The path that's used to write the Delta Lake table within the specified container. It includes the table name. For more information and examples, see [Write to a Delta Lake table](write-to-delta-lake.md). |
32
32
|Write mode | Write mode controls the way that Azure Stream Analytics writes to an output file. Exactly-once delivery only happens when Write mode is Once. For more information, see the next section. |
33
-
| Partition column | Optional. The {field} name from your output data to partition. Only one partition column is supported. |
34
-
| Path pattern | Required when the event serialization format is Delta Lake. The file path pattern that's used to write your blobs within the specified container. <br /><br /> In the path pattern, you can choose to use one or more instances of the date and time variables to specify the frequency at which blobs are written: {date}, {time}. <br /><br />If your Write mode is Once, you need to use both {date} and {time}. <br /><br />You can use custom blob partitioning to specify one custom {field} name from your event data to partition blobs. The field name is alphanumeric and can include spaces, hyphens, and underscores. Restrictions on custom fields include the following ones: <ul><li>No dynamic custom {field} name is allowed if your Write mode is Once. </li><li>Field names aren't case sensitive. For example, the service can't differentiate between column `ID` and column `id`.</li><li>Nested fields aren't permitted. Instead, use an alias in the job query to "flatten" the field.</li><li>Expressions can't be used as a field name.</li></ul> <br />This feature enables the use of custom date/time format specifier configurations in the path. Custom date/time formats must be specified one at a time and enclosed by the {datetime:\<specifier>} keyword. Allowable inputs for `\<specifier>` are `yyyy`, `MM`, `M`, `dd`, `d`, `HH`, `H`, `mm`, `m`, `ss`, or `s`. The {datetime:\<specifier>} keyword can be used multiple times in the path to form custom date/time configurations. <br /><br />Examples: <ul><li>Example 1: `cluster1/logs/{date}/{time}`</li><li>Example 2: `cluster1/logs/{date}`</li><li>Example 3: `cluster1/{client_id}/{date}/{time}`</li><li>Example 4: `cluster1/{datetime:ss}/{myField}` where the query is `SELECT data.myField AS myField FROM Input;`</li><li>Example 5: `cluster1/year={datetime:yyyy}/month={datetime:MM}/day={datetime:dd}`</ul><br />The time stamp of the created folder structure follows UTC and not local time. [System.Timestamp](./stream-analytics-time-handling.md#choose-the-best-starting-time) is the time used for all time-based partitioning.<br /><br />File naming uses the following convention: <br /><br />`{Path Prefix Pattern}/schemaHashcode_Guid_Number.extension`<br /><br /> Here, `Guid` represents the unique identifier assigned to an internal writer that's created to write to a blob file. The number represents the index of the blob block. <br /><br /> Example output files:<ul><li>`Myoutput/20170901/00/45434_gguid_1.csv`</li> <li>`Myoutput/20170901/01/45434_gguid_1.csv`</li></ul> <br />For more information about this feature, see [Azure Stream Analytics custom blob output partitioning](stream-analytics-custom-path-patterns-blob-storage-output.md). |
33
+
| Partition column | Optional. The `{field}` name from your output data to partition. Only one partition column is supported. |
34
+
| Path pattern | Required when the event serialization format is Delta Lake. The file path pattern that's used to write your blobs within the specified container. <br /><br /> In the path pattern, you can choose to use one or more instances of the date and time variables to specify the frequency at which blobs are written: `{date}`, `{time}`. <br /><br />If your Write mode is Once, you need to use both `{date}` and `{time}`. <br /><br />You can use custom blob partitioning to specify one custom `{field}` name from your event data to partition blobs. The field name is alphanumeric and can include spaces, hyphens, and underscores. Restrictions on custom fields include the following ones: <ul><li>No dynamic custom `{field}` name is allowed if your Write mode is Once. </li><li>Field names aren't case sensitive. For example, the service can't differentiate between column `ID` and column `id`.</li><li>Nested fields aren't permitted. Instead, use an alias in the job query to "flatten" the field.</li><li>Expressions can't be used as a field name.</li></ul> <br />This feature enables the use of custom date/time format specifier configurations in the path. Custom date/time formats must be specified one at a time and enclosed by the `{datetime:\<specifier>}` keyword. Allowable inputs for `\<specifier>` are `yyyy`, `MM`, `M`, `dd`, `d`, `HH`, `H`, `mm`, `m`, `ss`, or `s`. The `{datetime:\<specifier>}` keyword can be used multiple times in the path to form custom date/time configurations. <br /><br />Examples: <ul><li>Example 1: `cluster1/logs/{date}/{time}`</li><li>Example 2: `cluster1/logs/{date}`</li><li>Example 3: `cluster1/{client_id}/{date}/{time}`</li><li>Example 4: `cluster1/{datetime:ss}/{myField}` where the query is `SELECT data.myField AS myField FROM Input;`</li><li>Example 5: `cluster1/year={datetime:yyyy}/month={datetime:MM}/day={datetime:dd}`</ul><br />The time stamp of the created folder structure follows UTC and not local time. [System.Timestamp](./stream-analytics-time-handling.md#choose-the-best-starting-time) is the time used for all time-based partitioning.<br /><br />File naming uses the following convention: <br /><br />`{Path Prefix Pattern}/schemaHashcode_Guid_Number.extension`<br /><br /> Here, `Guid` represents the unique identifier assigned to an internal writer that's created to write to a blob file. The number represents the index of the blob block. <br /><br /> Example output files:<ul><li>`Myoutput/20170901/00/45434_gguid_1.csv`</li> <li>`Myoutput/20170901/01/45434_gguid_1.csv`</li></ul> <br />For more information about this feature, see [Azure Stream Analytics custom blob output partitioning](stream-analytics-custom-path-patterns-blob-storage-output.md). |
35
35
| Date format | Required when the event serialization format is Delta Lake. If the date token is used in the prefix path, you can select the date format in which your files are organized. An example is `YYYY/MM/DD`. |
36
36
| Time format | Required when the event serialization format is Delta Lake. If the time token is used in the prefix path, specify the time format in which your files are organized.|
37
37
|Minimum rows |The number of minimum rows per batch. For Parquet, every batch creates a new file. The current default value is 2,000 rows and the allowed maximum is 10,000 rows.|
@@ -58,13 +58,13 @@ There's no Write mode option for Delta Lake. However, Delta Lake output also pro
58
58
To receive exactly-once delivery for your Blob Storage or Data Lake Storage Gen2 account, you need to configure the following settings:
59
59
60
60
* Select **Once after all results of time partition is available** for your **Write Mode**.
61
-
* Provide **Path Pattern** with both {date} and {time} specified.
61
+
* Provide **Path Pattern** with both `{date}` and `{time}` specified.
* Path Pattern becomes a required property and must contain both {date} and {time}. No dynamic custom {field} name is allowed. Learn more about [custom path pattern](stream-analytics-custom-path-patterns-blob-storage-output.md).
67
+
* Path Pattern becomes a required property and must contain both `{date}` and `{time}`. No dynamic custom `{field}` name is allowed. Learn more about [custom path pattern](stream-analytics-custom-path-patterns-blob-storage-output.md).
68
68
* If the job is started at a custom time before or after the last output time, there's a risk of the file being overwritten. For example, when **time format** is set to **HH**, the file is generated every hour. If you stop the job at 8:15 AM and restart the job at 8:30 AM, the file generated between 8 AM to 9 AM only covers data from 8:30 AM to 9 AM. The data from 8 AM to 8:15 AM gets lost as it's overwritten.
69
69
70
70
## Blob output files
@@ -82,7 +82,7 @@ When you're using Blob Storage as output, a new file is created in the blob in t
82
82
83
83
## Partitioning
84
84
85
-
For partition key, use {date} and {time} tokens from your event fields in the path pattern. Choose the date format, such as `YYYY/MM/DD`, `DD/MM/YYYY`, or `MM-DD-YYYY`. `HH` is used for the time format. Blob output can be partitioned by a single custom event attribute {fieldname} or {datetime:\<specifier>}. The number of output writers follows the input partitioning for [fully parallelizable queries](stream-analytics-scale-jobs.md).
85
+
For partition key, use `{date}` and `{time}` tokens from your event fields in the path pattern. Choose the date format, such as `YYYY/MM/DD`, `DD/MM/YYYY`, or `MM-DD-YYYY`. `HH` is used for the time format. Blob output can be partitioned by a single custom event attribute `{fieldname}` or `{datetime:\<specifier>}`. The number of output writers follows the input partitioning for [fully parallelizable queries](stream-analytics-scale-jobs.md).
0 commit comments