Skip to content

Commit 0dbf805

Browse files
authored
Merge pull request #108291 from yossi-karp/ingest-doc-split
New files based on internal docs + updated TOC
2 parents 14b0042 + ac26217 commit 0dbf805

File tree

3 files changed

+103
-0
lines changed

3 files changed

+103
-0
lines changed
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: Data ingestion properties for Azure Data Explorer
3+
description: Learn about the various data ingestion properties supported by Azure Data Explorer.
4+
author: orspod
5+
ms.author: orspodek
6+
ms.reviewer: tzgitlin
7+
ms.service: data-explorer
8+
ms.topic: conceptual
9+
ms.date: 03/19/2020
10+
---
11+
12+
# Azure Data Explorer data ingestion properties
13+
14+
Data ingestion is the process by which data is added to a table and is made available for query in Azure Data Explorer. You add properties to the ingestion command after the `with` keyword.
15+
16+
## Ingestion properties
17+
18+
The following table lists the properties supported by Azure Data Explorer, describes them, and provides examples:
19+
20+
|Property |Description |Example |
21+
|----------------------|---------------------------------------------------------|----------------------------------------------------|
22+
|`ingestionMapping` |A string value that indicates how to map data from the source file to the actual columns in the table. Define the `format` value with the relevant mapping type. See [data mappings](/azure/kusto/management/mappings).|`with (format="json", ingestionMapping = "[{\"column\":\"rownumber\", \"Properties\":{\"Path\":\"$.RowNumber\"}}, {\"column\":\"rowguid\", \"Properties\":{\"Path\":\"$.RowGuid\"}}]")`<br>(deprecated: `avroMapping`, `csvMapping`, `jsonMapping`) |
23+
|`ingestionMappingReference`|A string value that indicates how to map data from the source file to the actual columns in the table using a named mapping policy object. Define the `format` value with the relevant mapping type. See [data mappings](/azure/kusto/management/mappings).|`with (format="csv", ingestionMappingReference = "Mapping1")`<br>(deprecated: `avroMappingReference`, `csvMappingReference`, `jsonMappingReference`)|
24+
|`creationTime` |The datetime value (formatted as an ISO8601 string) to use at the creation time of the ingested data extents. If unspecified, the current value (`now()`) will be used. Overriding the default is useful when ingesting older data, so that the retention policy will be applied correctly.|`with (creationTime="2017-02-13T11:09:36.7992775Z")`|
25+
|`extend_schema`|A Boolean value that, if specified, instructs the command to extend the schema of the table (defaults to `false`). This option applies only to `.append` and `.set-or-append` commands. The only allowed schema extensions have additional columns added to the table at the end.|If the original table schema is `(a:string, b:int)`, a valid schema extension would be `(a:string, b:int, c:datetime, d:string)`, but `(a:string, c:datetime)` wouldn't be valid|
26+
|`folder` |For [ingest-from-query](/azure/kusto/management/data-ingestion/ingest-from-query) commands, the folder to assign to the table. If the table already exists, this property will override the table's folder.|`with (folder="Tables/Temporary")`|
27+
|`format` |The data format (see [supported data formats](ingestion-supported-formats.md)).|`with (format="csv")`|
28+
|`ingestIfNotExists`|A string value that, if specified, prevents ingestion from succeeding if the table already has data tagged with an `ingest-by:` tag with the same value. This ensures idempotent data ingestion. For more information, see [ingest-by: tags](/azure/kusto/management/extents-overview#ingest-by-extent-tags).|The properties `with (ingestIfNotExists='["Part0001"]', tags='["ingest-by:Part0001"]')` indicate that if data with the tag `ingest-by:Part0001` already exists, then don't complete the current ingestion. If it doesn't already exist, this new ingestion should have this tag set (in case a future ingestion attempts to ingest the same data again.)|
29+
|`ignoreFirstRecord` |A Boolean value that, if set to `true`, indicates that ingestion should ignore the first record of every file. This property is useful for files in `CSV`and similar formats, if the first record in the file are the column names. By default, `false` is assumed.|`with (ignoreFirstRecord=false)`|
30+
|`persistDetails` |A Boolean value that, if specified, indicates that the command should persist the detailed results (even if successful) so that the [.show operation details](/azure/kusto/management/operations#show-operation-details) command could retrieve them. Defaults to `false`.|`with (persistDetails=true)`|
31+
|`policy_ingestiontime`|A Boolean value that, if specified, describes whether to enable the [Ingestion Time Policy](/azure/kusto/management/ingestiontimepolicy) on a table that is created by this command. The default is `true`.|`with (policy_ingestiontime=false)`|
32+
|`recreate_schema` |A Boolean value that, if specified, describes whether the command may recreate the schema of the table. This property applies only to the `.set-or-replace` command. This property takes precedence over the `extend_schema` property if both are set.|`with (recreate_schema=true)`|
33+
|`tags`|A list of [tags](/azure/kusto/management/extents-overview#extent-tagging) to associate with the ingested data, formatted as a JSON string |`with (tags="['Tag1', 'Tag2']")`|
34+
|`validationPolicy`|A JSON string that indicates which validations to run during ingestion. See [Data ingestion](/azure/kusto/management/data-ingestion/) for an explanation of the different options.| `with (validationPolicy='{"ValidationOptions":1, "ValidationImplications":1}')` (this is actually the default policy)|
35+
|`zipPattern`|Use this property when ingesting data from storage that has a ZIP archive. This is a string value indicating the regular expression to use when selecting which files in the ZIP archive to ingest. All other files in the archive will be ignored.|`with (zipPattern="*.csv")`|
36+
37+
## Next steps
38+
39+
* Learn more about [data ingestion](/azure/data-explorer/ingest-data-overview)
40+
* Learn more about [supported data formats](ingestion-supported-formats.md)
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
---
2+
title: Data formats supported by Azure Data Explorer for ingestion.
3+
description: Learn about the various data and compression formats supported by Azure Data Explorer for ingestion.
4+
author: orspod
5+
ms.author: orspodek
6+
ms.reviewer: tzgitlin
7+
ms.service: data-explorer
8+
ms.topic: conceptual
9+
ms.date: 03/19/2020
10+
---
11+
12+
# Data formats supported by Azure Data Explorer for ingestion
13+
14+
Data ingestion is the process by which data is added to a table and is made available for query in Azure Data Explorer. For all ingestion methods, other than ingest-from-query, the data must be in one of the supported formats. The following table lists and describes the formats that Azure Data Explorer supports for data ingestion.
15+
16+
|Format |Extension |Description|
17+
|---------|------------|-----------|
18+
|avro |`.avro` |An [Avro container file](https://avro.apache.org/docs/current/). The following codes are supported: `null`, `deflate` (`snappy` is currently not supported).|
19+
|CSV |`.csv` |A text file with comma-separated values (`,`). See [RFC 4180: _Common Format and MIME Type for Comma-Separated Values (CSV) Files_](https://www.ietf.org/rfc/rfc4180.txt).|
20+
|JSON |`.json` |A text file with JSON objects delimited by `\n` or `\r\n`. See [JSON Lines (JSONL)](http://jsonlines.org/).|
21+
|multijson|`.multijson`|A text file with a JSON array of property bags (each representing a record), or any number of property bags delimited by whitespace, `\n` or `\r\n`. Each property bag can be spread on multiple lines. This format is preferred over `JSON`, unless the data is non-property bags.|
22+
|orc |`.orc` |An [Orc file](https://en.wikipedia.org/wiki/Apache_ORC).|
23+
|parquet |`.parquet` |A [Parquet file](https://en.wikipedia.org/wiki/Apache_Parquet).|
24+
|psv |`.psv` |A text file with pipe-separated values (<code>&#124;</code>).|
25+
|raw |`.raw` |A text file whose entire contents is a single string value.|
26+
|scsv |`.scsv` |A text file with semicolon-separated values (`;`).|
27+
|sohsv |`.sohsv` |A text file with SOH-separated values. (SOH is ASCII codepoint 1; this format is used by Hive on HDInsight.)|
28+
|tsv |`.tsv` |A text file with tab-separated values (`\t`).|
29+
|tsve |`.tsv` |A text file with tab-separated values (`\t`). A backslash character (`\`) is used for escaping.|
30+
|txt |`.txt` |A text file with lines delimited by `\n`. Empty lines are skipped.|
31+
32+
## Supported data compression formats
33+
34+
Blobs and files can be compressed through any of the following compression algorithms:
35+
36+
|Compression|Extension|
37+
|-----------|---------|
38+
|GZip |.gz |
39+
|Zip |.zip |
40+
41+
Indicate compression by appending the extension to the name of the blob or file.
42+
43+
For example:
44+
* `MyData.csv.zip` indicates a blob or a file formatted as CSV, compressed with ZIP (archive or a single file)
45+
* `MyData.csv.gz` indicates a blob or a file formatted as CSV, compressed with GZip
46+
47+
Blob or file names that don't include the format extensions but just compression (for example, ) is also supported. In this case, the file format
48+
must be specified as an ingestion property because it cannot be inferred.
49+
50+
> [!NOTE]
51+
> Some compression formats keep track of the original file extension as part
52+
> of the compressed stream. This extension is generally ignored for
53+
> determining the file format. If the file format can't be determined from the (compressed)
54+
> blob or file name, it must be specified through the `format` ingestion property.
55+
56+
## Next steps
57+
58+
* Learn more about [data ingestion](/azure/data-explorer/ingest-data-overview)
59+
* Learn more about [Azure Data Explorer data ingestion properties](ingestion-properties.md)

articles/data-explorer/toc.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@
3232
- name: Data ingestion overview
3333
displayName: pipelines, connectors, plugins, Python, .NET, Java, Node, REST
3434
href: ingest-data-overview.md
35+
- name: Data ingestion properties
36+
href: ingestion-properties.md
37+
- name: Formats for data ingestion
38+
href: ingestion-supported-formats.md
3539
- name: Kusto Query Language
3640
items:
3741
- name: Quick reference guide

0 commit comments

Comments
 (0)