You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Data ingestion is the process by which data is added to a table and is made available for query. You add properties to the ingestion command after the `with` keyword.
13
+
Data ingestion adds data to a table and makes it available for query. Add properties to the ingestion command after the `with` keyword.
description: Learn about the various data and compression formats supported for ingestion.
2
+
title: Data Ingestion - Supported Formats and Compression
3
+
description: Explore the various data formats like CSV, JSON, Parquet, and more, supported for ingestion. Understand compression options and best practices for data preparation.
Data ingestion is the process by which data is added to a table and is made available for query. For all ingestion methods, other than ingest-from-query, the data must be in one of the supported formats. The following table lists and describes the formats that is supported for data ingestion.
13
+
Data ingestion adds data to a table and makes it available for query. For all ingestion methods, other than ingest-from-query, the data must be in one of the supported formats. The following table lists and describes the formats that are supported for data ingestion.
14
14
15
15
> [!NOTE]
16
16
> Before you ingest data, make sure that your data is properly formatted and defines the expected fields. We recommend using your preferred validator to confirm the format is valid. For example, you may find the following validators useful to check CSV or JSON files:
17
17
>
18
18
> * CSV: http://csvlint.io/
19
19
> * JSON: https://jsonlint.com/
20
20
21
-
For more information about why ingestion might fail, see [Ingestion failures](management/ingestion-failures.md)
21
+
To learn why ingestion might fail, see [Ingestion failures](management/ingestion-failures.md).
22
22
::: moniker range="azure-data-explorer"
23
-
and [Ingestion error codes in Azure Data Explorer](/azure/data-explorer/error-codes).
23
+
and [Ingestion error codes in Azure Data Explorer](/azure/data-explorer/error-codes).
24
24
::: moniker-end
25
25
26
26
|Format |Extension |Description|
27
27
|---------|------------|-----------|
28
-
|ApacheAvro|`.avro`|An [AVRO](https://avro.apache.org/docs/current/) format with support for [logical types](https://avro.apache.org/docs/++version++/specification/#Logical+Types). The following compression codecs are supported: `null`, `deflate`, and `snappy`. Reader implementation of the `apacheavro` format is based on the official [Apache Avro library](https://github.com/apache/avro). For information about ingesting Event Hub Capture Avro files, see [Ingesting Event Hub Capture Avro files](/azure/data-explorer/ingest-data-event-hub-overview#schema-mapping-for-event-hub-capture-avro-files). |
29
-
|Avro |`.avro`|A legacy implementation for [AVRO](https://avro.apache.org/docs/current/) format based on [.NET library](https://www.nuget.org/packages/Microsoft.Hadoop.Avro). The following compression codecs are supported: `null`, `deflate` (for `snappy` - use `ApacheAvro` data format). |
28
+
|ApacheAvro|`.avro`|An [Avro](https://avro.apache.org/docs/current/) format that supports [logical types](https://avro.apache.org/docs/++version++/specification/#Logical+Types). Supported compression codecs: `null`, `deflate`, and `snappy`. The reader implementation of the `apacheavro` format is based on the official [Apache Avro library](https://github.com/apache/avro). For details on ingesting Event Hubs Capture Avro files, see [Ingesting Event Hubs Capture Avro files](/azure/data-explorer/ingest-data-event-hub-overview#schema-mapping-for-event-hub-capture-avro-files). |
29
+
|Avro |`.avro`|A legacy implementation of the [Avro](https://avro.apache.org/docs/current/) format based on the [.NET library](https://www.nuget.org/packages/Microsoft.Hadoop.Avro). Supported compression codecs: `null` and `deflate`. To use `snappy`, use the `ApacheAvro` data format. |
30
30
|CSV |`.csv`|A text file with comma-separated values (`,`). See [RFC 4180: _Common Format and MIME Type for Comma-Separated Values (CSV) Files_](https://www.ietf.org/rfc/rfc4180.txt).|
31
31
|JSON |`.json`|A text file with JSON objects delimited by `\n` or `\r\n`. See [JSON Lines (JSONL)](http://jsonlines.org/).|
32
-
|MultiJSON|`.multijson`|A text file with a JSON array of property bags (each representing a record), or any number of property bags delimited by whitespace, `\n` or `\r\n`. Each property bag can be spread on multiple lines.|
32
+
|MultiJSON|`.multijson`|A text file with a JSON array of property bags (each representing a record), or any number of property bags delimited by whitespace, `\n`, or `\r\n`. Each property bag can span multiple lines.|
|PSV |`.psv`|A text file with pipe-separated values (<code>|</code>). |
36
-
|RAW |`.raw`|A text file whose entire contents is a single string value. |
36
+
|RAW |`.raw`|A text file whose entire contents are a single string value. |
37
37
|SCsv |`.scsv`|A text file with semicolon-separated values (`;`).|
38
38
|SOHsv |`.sohsv`|A text file with SOH-separated values. (SOH is ASCII codepoint 1; this format is used by Hive on HDInsight.)|
39
39
|TSV |`.tsv`|A text file with tab-separated values (`\t`).|
@@ -43,42 +43,42 @@ and [Ingestion error codes in Azure Data Explorer](/azure/data-explorer/error-c
43
43
44
44
> [!NOTE]
45
45
>
46
-
> * Ingestion from data storage systems that provide ACID functionality on top of regular Parquet format files (e.g. Apache Iceberg, Apache Hudi, Delta Lake) is not supported.
47
-
> *Schema-less Avro is not supported.
46
+
> * Ingestion from data storage systems that provide ACID functionality on top of regular Parquet format files (for example, Apache Iceberg, Apache Hudi, and Delta Lake) isn't supported.
47
+
> *Schemaless Avro isn't supported.
48
48
49
49
::: moniker range="azure-data-explorer"
50
-
For more info on ingesting data using `json` or `multijson` formats, see [ingest json formats](/azure/data-explorer/ingest-json-formats).
50
+
For more information about ingesting data by using the `json` or `multijson` formats, see [Ingest JSON formats](/azure/data-explorer/ingest-json-formats).
51
51
::: moniker-end
52
52
53
53
## Supported data compression formats
54
54
55
-
Blobs and files can be compressed through any of the following compression algorithms:
55
+
Compress blobs and files with these algorithms:
56
56
57
57
|Compression|Extension|
58
58
|-----------|---------|
59
59
|gzip |.gz |
60
60
|zip |.zip |
61
61
62
-
Indicate compression by appending the extension to the name of the blob or file.
62
+
Indicate compression by appending the extension to the blob or file name.
63
63
64
64
For example:
65
65
66
-
*`MyData.csv.zip` indicates a blob or a file formatted as CSV, compressed with zip (archive or a single file)
67
-
*`MyData.json.gz` indicates a blob or a file formatted as JSON, compressed with gzip.
66
+
*`MyData.csv.zip` indicates a blob or file formatted as CSV, compressed with zip (archive or single file).
67
+
*`MyData.json.gz` indicates a blob or file formatted as JSON, compressed with gzip.
68
68
69
-
Blob or file names that don't include the format extensions but just compression (for example, `MyData.zip`) is also supported. In this case, the file format
70
-
must be specified as an ingestion property because it cannot be inferred.
69
+
Blob or file names that include only the compression extension (for example, `MyData.zip`) are also supported. In this case, specify the file format
70
+
as an ingestion property because it can't be inferred.
71
71
72
72
> [!NOTE]
73
73
>
74
-
> * Some compression formats keep track of the original file extension as part of the compressed stream. This extension is generally ignored for determining the file format. If the file format can't be determined from the (compressed) blob or file name, it must be specified through the `format` ingestion property.
75
-
> *Not to be confused with internal (chunklevel) compression codec used by `Parquet`, `AVRO` and `ORC` formats. Internal compression name is usually added to a file name before file format extension, for example:`file1.gz.parquet`, `file1.snappy.avro`, etc.
76
-
> *[Deflate64/Enhanced Deflate](https://en.wikipedia.org/wiki/Deflate#Deflate64/Enhanced_Deflate) zip compression method is not supported. Please note that Windows built-in zip compressor may choose to use this compression method on files of size over 2GB.
74
+
> * Some compression formats store the original file extension in the compressed stream. Ignore this extension when you determine the file format. If you can't determine the file format from the compressed blob or file name, specify it with the `format` ingestion property.
75
+
> *Don't confuse these with internal chunk-level compression codecs used by `Parquet`, `AVRO`, and `ORC` formats. The internal compression name is usually added before the file format extension (for example,`file1.gz.parquet`, `file1.snappy.avro`).
76
+
> *The [Deflate64/Enhanced Deflate](https://en.wikipedia.org/wiki/Deflate#Deflate64/Enhanced_Deflate) zip compression method isn't supported. Windows built-in zip compressor can use this method on files larger than 2 GB.
77
77
78
78
## Related content
79
79
80
-
*Learn more about [supported data formats](ingestion-supported-formats.md)
81
-
*Learn more about [Data ingestion properties](ingestion-properties.md)
80
+
*[Supported data formats](ingestion-supported-formats.md)
The `.cancel queued ingestion operation` command cancels an ingestion operation. This command is useful for aborting an ingestion operation that is taking too long to complete.
12
+
The `.cancel queued ingestion operation` command cancels an ingestion operation. Use this command to stop an ingestion operation that takes too long to finish.
13
13
14
-
The cancel operation command is done on a besteffort basis. For example, ongoing ingestion processes or in-flight ingestion, might not get canceled.
14
+
The cancel operation command works on a best-effort basis. For example, ongoing ingestion processes or in-flight ingestion might not be canceled.
15
15
16
16
> [!NOTE]
17
17
>
18
-
> Queued ingestion commands are run on the data ingestion URI endpoint `https://ingest-<YourClusterName><Region>.kusto.windows.net`.
18
+
> Queued ingestion commands run on the data ingestion URI endpoint `https://ingest-<YourClusterName><Region>.kusto.windows.net`.
19
19
20
20
## Permissions
21
21
22
-
You must have at least [Table Ingestor](../../access-control/role-based-access-control.md) permissions to run this command.
22
+
You need at least [Table Ingestor](../../access-control/role-based-access-control.md) permissions to run this command.
23
23
24
24
## Syntax
25
25
@@ -31,42 +31,39 @@ You must have at least [Table Ingestor](../../access-control/role-based-access-c
31
31
32
32
| Name | Type | Required | Description |
33
33
|--|--|--|--|
34
-
|*IngestionOperationId*|`string`|:heavy_check_mark:| The unique ingestion operation ID returned from the running command.|
34
+
|*IngestionOperationId*|`string`|:heavy_check_mark:| The unique ingestion operation ID returned by the running command.|
|StartedOn |`datetime`|Date/time, in UTC, at which the `.ingest-from-storage-queued`was executed.|
42
-
|LastUpdatedOn |`datetime`|Date/time, in UTC, when the status was updated.|
41
+
|StartedOn |`datetime`|Date and time, in UTC, when the `.ingest-from-storage-queued`operation is executed.|
42
+
|LastUpdatedOn |`datetime`|Date and time, in UTC, when the status is updated.|
43
43
|State |`string`|The state of the operation.|
44
44
|Discovered |`long`|Count of the blobs that were listed from storage and queued for ingestion.|
45
45
|Pending |`long`|Count of the blobs to be ingested.|
46
-
|Canceled |`long`|Count of the blobs that were canceled due to a call to the [.cancel queued ingestion operation](cancel-queued-ingestion-operation-command.md) command.|
46
+
|Canceled |`long`|Count of the blobs that are canceled due to a call to the [.cancel queued ingestion operation](cancel-queued-ingestion-operation-command.md) command.|
47
47
|Ingested |`long`|Count of the blobs that have been ingested.|
48
-
|Failed |`long`|Count of the blobs that failed**permanently**.|
49
-
|SampleFailedReasons |`string`|A sample of reasons for blob ingestion failures.|
48
+
|Failed |`long`|Count of the blobs that fail**permanently**.|
49
+
|SampleFailedReasons |`string`|A sample of reasons for blob ingestion failure.|
50
50
|Database |`string`|The database where the ingestion process is occurring.|
51
51
|Table |`string`| The table where the ingestion process is occurring.|
52
52
53
-
>[!NOTE]
54
-
> If the ingestion operation was initiated with tracking disabled, cancellation commands execute on a best‑effort basis. The returned state may indicate: "Cancellation request received – service will attempt best effort cancellation (tracking isn't enabled on operation)"
55
-
56
53
## Example
57
54
58
-
The following example cancels the ingestion of operation `00001111;11112222;00001111-aaaa-2222-bbbb-3333cccc4444`.
55
+
This example cancels the ingestion operation `00001111;11112222;00001111-aaaa-2222-bbbb-3333cccc4444`.
0 commit comments