From f923e62fa0ee61a10f9384a2ad8e29e63c1ea175 Mon Sep 17 00:00:00 2001 From: Sreedhar Pelluru <6722422+spelluru@users.noreply.github.com> Date: Tue, 2 Sep 2025 14:31:22 -0400 Subject: [PATCH 1/3] ADX Freshness - 09-02 --- data-explorer/connect-odbc.md | 30 ++++++++++++++-------------- data-explorer/ingest-json-formats.md | 2 +- 2 files changed, 16 insertions(+), 16 deletions(-) diff --git a/data-explorer/connect-odbc.md b/data-explorer/connect-odbc.md index f796af06d3..55f3cf6014 100644 --- a/data-explorer/connect-odbc.md +++ b/data-explorer/connect-odbc.md @@ -3,7 +3,7 @@ title: Connect to Azure Data Explorer with ODBC description: In this article, you learn how to set up an Open Database Connectivity (ODBC) connection to Azure Data Explorer. ms.reviewer: gabil ms.topic: how-to -ms.date: 05/26/2024 +ms.date: 09/02/2025 --- # Connect to Azure Data Explorer with ODBC @@ -12,18 +12,18 @@ Open Database Connectivity ([ODBC](/sql/odbc/reference/odbc-overview)) is a wide Consequently, you can establish a connection to Azure Data Explorer from any application that is equipped with support for the ODBC driver for SQL Server. -Watch the following video to learn to create an ODBC connection. +Watch the following video to learn how to create an ODBC connection. > [!VIDEO https://www.youtube.com/embed/qA5wxhrOwog] Alternatively, follow the steps to [connect to your cluster with ODBC](#connect-to-your-cluster-with-odbc). > [!NOTE] -> We recommend using dedicated connectors whenever possible. For a list of available connectors, see [Connectors overview](integrate-data-overview.md). +> Use dedicated connectors when possible. For a list of available connectors, see [Connectors overview](integrate-data-overview.md). ## Prerequisites -* [Microsoft ODBC Driver for SQL Server version 17.2.0.1 or later](/sql/connect/odbc/download-odbc-driver-for-sql-server) for your operating system. +* [Microsoft ODBC Driver for SQL Server](/sql/connect/odbc/download-odbc-driver-for-sql-server) version 17.2.0.1 or later for your operating system. ## Connect to your cluster with ODBC @@ -45,41 +45,41 @@ To configure an ODBC data source using the ODBC driver for SQL Server: 1. Select **Add**. -:::image type="content" source="media/connect-odbc/add-data-source.png" alt-text="Add data source."::: +:::image type="content" source="media/connect-odbc/add-data-source.png" alt-text="Screenshot of the ODBC Data Sources dialog showing the Add Data Source option and fields for creating a new DSN."::: 1. Select **ODBC Driver 17 for SQL Server** then **Finish**. - :::image type="content" source="media/connect-odbc/select-driver.png" alt-text="Select driver."::: + :::image type="content" source="media/connect-odbc/select-driver.png" alt-text="Screenshot of the ODBC driver selection dialog showing ODBC Driver 17 for SQL Server selected."::: -1. Enter a name and description for the connection and the cluster you want to connect to, then select **Next**. The cluster URL should be in the form *\.\.kusto.windows.net*. +1. Enter a name and description for the connection and the cluster you want to connect to, then select **Next**. The cluster URL should be in the form `\.\.kusto.windows.net`. >[!NOTE] - > When entering the cluster URL, do not include the prefix "https://". + > When entering the cluster URL, don't include the prefix `https://`. - :::image type="content" source="media/connect-odbc/select-server.png" alt-text="Select server."::: + :::image type="content" source="media/connect-odbc/select-server.png" alt-text="Screenshot of the Data Source Configuration window showing the Server field and an example cluster URL format."::: 1. Select **Active Directory Integrated** then **Next**. - :::image type="content" source="media/connect-odbc/active-directory-integrated.png" alt-text="Active directory integrated."::: + :::image type="content" source="media/connect-odbc/active-directory-integrated.png" alt-text="Screenshot of the authentication method dropdown showing Active Directory Integrated selected."::: 1. Select the database with the sample data then **Next**. - :::image type="content" source="media/connect-odbc/change-default-database.png" alt-text="Cahnge default database."::: + :::image type="content" source="media/connect-odbc/change-default-database.png" alt-text="Screenshot of the default database selection dialog showing the sample data database chosen."::: 1. On the next screen, leave all options as defaults then select **Finish**. 1. Select **Test Data Source**. - :::image type="content" source="media/connect-odbc/test-data-source.png" alt-text="Test data source."::: + :::image type="content" source="media/connect-odbc/test-data-source.png" alt-text="Screenshot of the Test Data Source dialog showing the Test Data Source button and connection status fields."::: 1. Verify that the test succeeded then select **OK**. If the test didn't succeed, check the values that you specified in previous steps, and ensure you have sufficient permissions to connect to the cluster. - :::image type="content" source="media/connect-odbc/test-succeeded.png" alt-text="Test succeeded."::: + :::image type="content" source="media/connect-odbc/test-succeeded.png" alt-text="Screenshot of the Test Data Source results showing a successful connection confirmation message."::: --- > [!NOTE] -> Azure Data Explorer considers string values as `NVARCHAR(MAX)`, which may not work well with some ODBC applications. Cast the data to `NVARCHAR(`*n*`)` using the `Language` parameter in the connection string. For example, `Language=any@MaxStringSize:5000` will encode strings as `NVARCHAR(5000)`. For more information, see [tuning options](sql-server-emulation-overview.md#tuning-options). +> Azure Data Explorer treats string values as `NVARCHAR(MAX)`, which can cause issues with some ODBC applications. Cast strings to `NVARCHAR(\)` by using the `Language` parameter in the connection string. For example, `Language=any@MaxStringSize:5000` encodes strings as `NVARCHAR(5000)`. For more information, see [tuning options](sql-server-emulation-overview.md#tuning-options). ## Application authentication @@ -120,4 +120,4 @@ Language = any@AadAuthority: ## Related content * [SQL Server emulation in Azure Data Explorer](sql-server-emulation-overview.md) -* [Run KQL queries and call stored functions](sql-kql-queries-and-stored-functions.md) +* [Run Kusto Query Language (KQL) queries and call stored functions](sql-kql-queries-and-stored-functions.md) diff --git a/data-explorer/ingest-json-formats.md b/data-explorer/ingest-json-formats.md index 6e8af07f3c..97aa752f94 100644 --- a/data-explorer/ingest-json-formats.md +++ b/data-explorer/ingest-json-formats.md @@ -3,7 +3,7 @@ title: Ingest JSON formatted data into Azure Data Explorer description: Learn about how to ingest JSON formatted data into Azure Data Explorer. ms.reviewer: kerend ms.topic: how-to -ms.date: 09/14/2022 +ms.date: 09/02/2025 --- # Ingest JSON formatted sample data into Azure Data Explorer From c2e14b43e64fbd8201e9066f1834ee7acae3f58b Mon Sep 17 00:00:00 2001 From: Sreedhar Pelluru <6722422+spelluru@users.noreply.github.com> Date: Tue, 2 Sep 2025 14:45:25 -0400 Subject: [PATCH 2/3] Freshness --- data-explorer/ingest-json-formats.md | 42 ++++++++++++++++------------ 1 file changed, 24 insertions(+), 18 deletions(-) diff --git a/data-explorer/ingest-json-formats.md b/data-explorer/ingest-json-formats.md index 97aa752f94..5eb86620fc 100644 --- a/data-explorer/ingest-json-formats.md +++ b/data-explorer/ingest-json-formats.md @@ -1,14 +1,20 @@ --- -title: Ingest JSON formatted data into Azure Data Explorer -description: Learn about how to ingest JSON formatted data into Azure Data Explorer. +title: Ingest JSON Data Into Azure Data Explorer +description: Ingest JSON to Azure Data Explorer with step-by-step KQL, C#, and Python examples for raw, mapped, multiline, and array records. Follow best practices. +#customer intent: As a data engineer, I want to ingest line-separated JSON into Azure Data Explorer so that I can capture raw telemetry in a dynamic column. ms.reviewer: kerend ms.topic: how-to ms.date: 09/02/2025 +ms.custom: + - ai-gen-docs-bap + - ai-gen-title + - ai-seo-date:09/02/2025 + - ai-gen-description --- # Ingest JSON formatted sample data into Azure Data Explorer -This article shows you how to ingest JSON formatted data into an Azure Data Explorer database. You'll start with simple examples of raw and mapped JSON, continue to multi-lined JSON, and then tackle more complex JSON schemas containing arrays and dictionaries. The examples detail the process of ingesting JSON formatted data using Kusto Query Language (KQL), C#, or Python. +This article shows you how to ingest JSON formatted data into an Azure Data Explorer database. You start with simple examples of raw and mapped JSON, continue to multi-lined JSON, and then tackle more complex JSON schemas containing arrays and dictionaries. The examples detail the process of ingesting JSON formatted data using Kusto Query Language (KQL), C#, or Python. > [!NOTE] > We don't recommend using `.ingest` management commands in production scenarios. Instead, use a [data connector](integrate-data-overview.md) or programmatically ingest data using one of the [Kusto client libraries](/kusto/api/client-libraries?view=azure-data-explorer&preserve-view=true). @@ -26,13 +32,13 @@ Azure Data Explorer supports two JSON file formats: * `multijson`: Multi-lined JSON. The parser ignores the line separators and reads a record from the previous position to the end of a valid JSON. > [!NOTE] -> When ingesting using the [get data experience](ingest-data-overview.md), the default format is `multijson`. The format can handle multiline JSON records and arrays of JSON records. When a parsing error is encountered, the entire file is discarded. To ignore invalid JSON records, select the option to "Ignore data format errors.", which will switch the format to `json` (JSON Lines). +> When ingesting using the [Get data experience](ingest-data-overview.md), the default format is `multijson`. The format can handle multiline JSON records and arrays of JSON records. When a parsing error is encountered, the entire file is discarded. To ignore invalid JSON records, select the option to "Ignore data format errors.", which switches the format to `json` (JSON Lines). > -> If you're using the JSON Line format (`json`), lines that don't represent a valid JSON records are skipped during parsing. +> If you're using the JSON Line format (`json`), lines that don't represent valid JSON records are skipped during parsing. ### Ingest and map JSON formatted data -Ingestion of JSON formatted data requires you to specify the *format* using [ingestion property](/kusto/ingestion-properties?view=azure-data-explorer&preserve-view=true). Ingestion of JSON data requires [mapping](/kusto/management/mappings?view=azure-data-explorer&preserve-view=true), which maps a JSON source entry to its target column. When ingesting data, use the `IngestionMapping` property with its `ingestionMappingReference` (for a pre-defined mapping) ingestion property or its `IngestionMappings` property. This article will use the `ingestionMappingReference` ingestion property, which is pre-defined on the table used for ingestion. In the examples below, we'll start by ingesting JSON records as raw data to a single column table. Then we'll use the mapping to ingest each property to its mapped column. +Ingestion of JSON formatted data requires you to specify the *format* using [ingestion property](/kusto/ingestion-properties?view=azure-data-explorer&preserve-view=true). Ingestion of JSON data requires [mapping](/kusto/management/mappings?view=azure-data-explorer&preserve-view=true), which maps a JSON source entry to its target column. When ingesting data, use the `IngestionMapping` property with its `ingestionMappingReference` (for a predefined mapping) ingestion property or its `IngestionMappings` property. This article uses the `ingestionMappingReference` ingestion property, which is predefined on the table used for ingestion. In the following examples, we start by ingesting JSON records as raw data to a single column table. Then we use the mapping to ingest each property to its mapped column. ### Simple JSON example @@ -206,7 +212,7 @@ In this example, you ingest JSON records data. Each JSON property is mapped to a ### [KQL](#tab/kusto-query-language) -1. Create a new table, with a similar schema to the JSON input data. We'll use this table for all the following examples and ingest commands. +1. Create a new table, with a similar schema to the JSON input data. We use this table for all the following examples and ingest commands. ```kusto .create table Events (Time: datetime, Device: string, MessageId: string, Temperature: double, Humidity: double) @@ -218,7 +224,7 @@ In this example, you ingest JSON records data. Each JSON property is mapped to a .create table Events ingestion json mapping 'FlatEventMapping' '[{"column":"Time","Properties":{"path":"$.timestamp"}},{"column":"Device","Properties":{"path":"$.deviceId"}},{"column":"MessageId","Properties":{"path":"$.messageId"}},{"column":"Temperature","Properties":{"path":"$.temperature"}},{"column":"Humidity","Properties":{"path":"$.humidity"}}]' ``` - In this mapping, as defined by the table schema, the `timestamp` entries will be ingested to the column `Time` as `datetime` data types. + In this mapping, as defined by the table schema, the `timestamp` entries are ingested to the column `Time` as `datetime` data types. 1. Ingest data into the `Events` table. @@ -230,7 +236,7 @@ In this example, you ingest JSON records data. Each JSON property is mapped to a ### [C#](#tab/c-sharp) -1. Create a new table, with a similar schema to the JSON input data. We'll use this table for all the following examples and ingest commands. +1. Create a new table, with a similar schema to the JSON input data. We use this table for all the following examples and ingest commands. ```csharp var tableName = "Events"; @@ -268,7 +274,7 @@ In this example, you ingest JSON records data. Each JSON property is mapped to a await kustoClient.ExecuteControlCommandAsync(command); ``` - In this mapping, as defined by the table schema, the `timestamp` entries will be ingested to the column `Time` as `datetime` data types. + In this mapping, as defined by the table schema, the `timestamp` entries are ingested to the column `Time` as `datetime` data types. 1. Ingest data into the `Events` table. @@ -286,7 +292,7 @@ In this example, you ingest JSON records data. Each JSON property is mapped to a ### [Python](#tab/python) -1. Create a new table, with a similar schema to the JSON input data. We'll use this table for all the following examples and ingest commands. +1. Create a new table, with a similar schema to the JSON input data. We use this table for all the following examples and ingest commands. ```python TABLE = "Events" @@ -363,7 +369,7 @@ INGESTION_CLIENT.ingest_from_blob( ## Ingest JSON records containing arrays -Array data types are an ordered collection of values. Ingestion of a JSON array is done by an [update policy](/kusto/management/show-table-update-policy-command?view=azure-data-explorer&preserve-view=true). The JSON is ingested as-is to an intermediate table. An update policy runs a pre-defined function on the `RawEvents` table, reingesting the results to the target table. We'll ingest data with the following structure: +Array data types are an ordered collection of values. Ingestion of a JSON array is done by an [update policy](/kusto/management/show-table-update-policy-command?view=azure-data-explorer&preserve-view=true). The JSON is ingested as-is to an intermediate table. An update policy runs a predefined function on the `RawEvents` table, reingesting the results to the target table. We ingest data with the following structure: ```json { @@ -389,7 +395,7 @@ Array data types are an ordered collection of values. Ingestion of a JSON array ### [KQL](#tab/kusto-query-language) -1. Create an `update policy` function that expands the collection of `records` so that each value in the collection receives a separate row, using the `mv-expand` operator. We'll use table `RawEvents` as a source table and `Events` as a target table. +1. Create an `update policy` function that expands the collection of `records` so that each value in the collection receives a separate row, using the `mv-expand` operator. We use the table `RawEvents` as a source table and `Events` as a target table. ```kusto .create function EventRecordsExpand() { @@ -410,7 +416,7 @@ Array data types are an ordered collection of values. Ingestion of a JSON array EventRecordsExpand() | getschema ``` -1. Add the update policy to the target table. This policy will automatically run the query on any newly ingested data in the `RawEvents` intermediate table and ingest the results into the `Events` table. Define a zero-retention policy to avoid persisting the intermediate table. +1. Add the update policy to the target table. This policy automatically runs the query on any newly ingested data in the `RawEvents` intermediate table and ingests the results into the `Events` table. Define a zero-retention policy to avoid persisting the intermediate table. ```kusto .alter table Events policy update @'[{"Source": "RawEvents", "Query": "EventRecordsExpand()", "IsEnabled": "True"}]' @@ -430,7 +436,7 @@ Array data types are an ordered collection of values. Ingestion of a JSON array ### [C#](#tab/c-sharp) -1. Create an update function that expands the collection of `records` so that each value in the collection receives a separate row, using the `mv-expand` operator. We'll use table `RawEvents` as a source table and `Events` as a target table. +1. Create an update function that expands the collection of `records` so that each value in the collection receives a separate row, using the `mv-expand` operator. We use table `RawEvents` as a source table and `Events` as a target table. ```csharp var command = CslCommandGenerator.GenerateCreateFunctionCommand( @@ -454,7 +460,7 @@ Array data types are an ordered collection of values. Ingestion of a JSON array > [!NOTE] > The schema received by the function must match the schema of the target table. -1. Add the update policy to the target table. This policy will automatically run the query on any newly ingested data in the `RawEvents` intermediate table and ingest its results into the `Events` table. Define a zero-retention policy to avoid persisting the intermediate table. +1. Add the update policy to the target table. This policy automatically runs the query on any newly ingested data in the `RawEvents` intermediate table and ingests its results into the `Events` table. Define a zero-retention policy to avoid persisting the intermediate table. ```csharp command = ".alter table Events policy update @'[{'Source': 'RawEvents', 'Query': 'EventRecordsExpand()', 'IsEnabled': 'True'}]"; @@ -479,7 +485,7 @@ Array data types are an ordered collection of values. Ingestion of a JSON array ### [Python](#tab/python) -1. Create an update function that expands the collection of `records` so that each value in the collection receives a separate row, using the `mv-expand` operator. We'll use table `RawEvents` as a source table and `Events` as a target table. +1. Create an update function that expands the collection of `records` so that each value in the collection receives a separate row, using the `mv-expand` operator. We use the table `RawEvents` as a source table and `Events` as a target table. ```python CREATE_FUNCTION_COMMAND = @@ -500,7 +506,7 @@ Array data types are an ordered collection of values. Ingestion of a JSON array > [!NOTE] > The schema received by the function has to match the schema of the target table. -1. Add the update policy to the target table. This policy will automatically run the query on any newly ingested data in the `RawEvents` intermediate table and ingest its results into the `Events` table. Define a zero-retention policy to avoid persisting the intermediate table. +1. Add the update policy to the target table. This policy automatically runs the query on any newly ingested data in the `RawEvents` intermediate table and ingests its results into the `Events` table. Define a zero-retention policy to avoid persisting the intermediate table. ```python CREATE_UPDATE_POLICY_COMMAND = From 1b30a4716de9d0ed49e6ca549855194f1ba75adc Mon Sep 17 00:00:00 2001 From: Sreedhar Pelluru <6722422+spelluru@users.noreply.github.com> Date: Tue, 2 Sep 2025 14:52:02 -0400 Subject: [PATCH 3/3] Freshness review --- data-explorer/ingestion-supported-formats.md | 25 ++++++++++++-------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/data-explorer/ingestion-supported-formats.md b/data-explorer/ingestion-supported-formats.md index e68d8162c4..9ed3b33910 100644 --- a/data-explorer/ingestion-supported-formats.md +++ b/data-explorer/ingestion-supported-formats.md @@ -1,9 +1,14 @@ --- -title: Data formats supported by Azure Data Explorer for ingestion. +title: Supported Ingestion Formats In Azure Data Explorer description: Learn about the various data and compression formats supported by Azure Data Explorer for ingestion. +#customer intent: As a data engineer, I want to know which file formats Azure Data Explorer supports for ingestion so that I can prepare source files correctly. ms.reviewer: tzgitlin ms.topic: conceptual -ms.date: 09/13/2022 +ms.date: 09/02/2025 +ms.custom: + - ai-gen-docs-bap + - ai-gen-title + - ai-seo-date:09/02/2025 --- # Data formats supported by Azure Data Explorer for ingestion @@ -13,7 +18,7 @@ ms.date: 09/13/2022 Data ingestion is the process by which data is added to a table and is made available for query in Azure Data Explorer. For all ingestion methods, other than ingest-from-query, the data must be in one of the supported formats. The following table lists and describes the formats that Azure Data Explorer supports for data ingestion. > [!NOTE] -> Before you ingest data, make sure that your data is properly formatted and defines the expected fields. We recommend using your preferred validator to confirm the format is valid. For example, you may find the following validators useful to check CSV or JSON files: +> Before you ingest data, make sure that your data is properly formatted and defines the expected fields. We recommend using your preferred validator to confirm the format is valid. For example, you might find the following validators useful to check CSV or JSON files: > > * CSV: http://csvlint.io/ > * JSON: https://jsonlint.com/ @@ -26,7 +31,7 @@ Data ingestion is the process by which data is added to a table and is made avai |Avro |`.avro` |A legacy implementation for [AVRO](https://avro.apache.org/docs/current/) format based on [.NET library](https://www.nuget.org/packages/Microsoft.Hadoop.Avro). The following compression codecs are supported: `null`, `deflate` (for `snappy` - use `ApacheAvro` data format). | |CSV |`.csv` |A text file with comma-separated values (`,`). See [RFC 4180: _Common Format and MIME Type for Comma-Separated Values (CSV) Files_](https://www.ietf.org/rfc/rfc4180.txt).| |JSON |`.json` |A text file with JSON objects delimited by `\n` or `\r\n`. See [JSON Lines (JSONL)](http://jsonlines.org/).| -|MultiJSON|`.multijson`|A text file with a JSON array of property bags (each representing a record), or any number of property bags delimited by whitespace, `\n` or `\r\n`. Each property bag can be spread on multiple lines.| +|MultiJSON|`.multijson`|A text file with a JSON array of property bags (each representing a record), or any number of property bags delimited by whitespace, `\n`, or `\r\n`. Each property bag can be spread on multiple lines.| |ORC |`.orc` |An [ORC file](https://en.wikipedia.org/wiki/Apache_ORC).| |Parquet |`.parquet` |A [Parquet file](https://en.wikipedia.org/wiki/Apache_Parquet). | |PSV |`.psv` |A text file with pipe-separated values (|). | @@ -40,11 +45,11 @@ Data ingestion is the process by which data is added to a table and is made avai > [!NOTE] > -> * Ingestion from data storage systems that provide ACID functionality on top of regular Parquet format files (e.g. Apache Iceberg, Apache Hudi, Delta Lake) is not supported. +> * Ingestion from data storage systems that provide ACID functionality on top of regular Parquet format files (for example, Apache Iceberg, Apache Hudi, Delta Lake) isn't supported. > -> * Schema-less Avro is not supported. +> * Schema-less Avro isn't supported. > -> * For more info on ingesting data using `json` or `multijson` formats, please refer to [this document](ingest-json-formats.md). +> * For more info on ingesting data using `json` or `multijson` formats, see [this article](ingest-json-formats.md). ## Supported data compression formats @@ -62,12 +67,12 @@ For example: * `MyData.json.gz` indicates a blob or a file formatted as JSON, compressed with gGzip. Blob or file names that don't include the format extensions but just compression (for example, `MyData.zip`) is also supported. In this case, the file format -must be specified as an ingestion property because it cannot be inferred. +must be specified as an ingestion property because it can't be inferred. > [!NOTE] > * Some compression formats keep track of the original file extension as part of the compressed stream. This extension is generally ignored for determining the file format. If the file format can't be determined from the (compressed) blob or file name, it must be specified through the `format` ingestion property. -> * Not to be confused with internal (chunk level) compression codec used by `Parquet`, `AVRO` and `ORC` formats. Internal compression name is usually added to a file name before file format extension, for example: `file1.gz.parquet`, `file1.snappy.avro`, etc. -> * [Deflate64/Enhanced Deflate](https://en.wikipedia.org/wiki/Deflate#Deflate64/Enhanced_Deflate) zip compression method is not supported. Please note that Windows built-in zip compressor may choose to use this compression method on files of size over 2GB. +> * Not to be confused with internal (chunk level) compression codec used by `Parquet`, `AVRO`, and `ORC` formats. Internal compression name is usually added to a file name before file format extension, for example: `file1.gz.parquet`, `file1.snappy.avro`, etc. +> * [Deflate64/Enhanced Deflate](https://en.wikipedia.org/wiki/Deflate#Deflate64/Enhanced_Deflate) zip compression method isn't supported. Windows built-in zip compressor might choose to use this compression method on files of size over 2GB. ## Related content