Skip to content

Commit 2bc9e35

Browse files
committed
Changes per Ornat's comments and new content
1 parent 1f0931e commit 2bc9e35

File tree

3 files changed

+47
-39
lines changed

3 files changed

+47
-39
lines changed

articles/data-explorer/lightingest.md

Lines changed: 47 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: LightIngest
3-
description: Learn about LightIngest, a command-line utility for ad-hoc data ingestion into Kusto
2+
title: LightIngest is a command-line utility for ingestion into Azure Data Explorer.
3+
description: Learn about LightIngest, a command-line utility for ad-hoc data ingestion into Azure Data Explorer.
44
author: orspod
55
ms.author: orspodek
66
ms.reviewer: tzgitlin
@@ -11,7 +11,7 @@ ms.date: 03/17/2020
1111

1212
# LightIngest
1313

14-
LightIngest is a command-line utility for ad-hoc data ingestion into Kusto.
14+
LightIngest is a command-line utility for ad-hoc data ingestion into Azure Data Explorer.
1515
The utility can pull source data from a local folder or from an Azure blob storage container.
1616

1717
## Prerequisites
@@ -39,25 +39,24 @@ The utility can pull source data from a local folder or from an Azure blob stora
3939
>
4040
>![Command line Help](media/lightingest/lightingest-cmd-line-help.png)
4141
42-
1. (Mandatory) Enter `LightIngest` followed by the connection string to the Kusto cluster that will manage the ingestion.
43-
The connection string should be enclosed in double quotes and follow the [Kusto connection strings specification](https://docs.microsoft.com/azure/kusto/api/connection-strings/kusto).
42+
1. Enter `LightIngest` followed by the connection string to the Azure Data Explorer cluster that will manage the ingestion.
43+
Enclose the connection string in double quotes and follow the [Kusto connection strings specification](https://docs.microsoft.com/azure/kusto/api/connection-strings/kusto).
4444

4545
For example:
4646
```
4747
LightIngest "Data Source=https://ingest-tzgitlin.westus.kusto.windows.net;AAD Federated Security=True" -db:TzviaTest -table:Trips -source:"https://tzgitlinegdemo2.blob.core.windows.net/saadxworkshop1;VXPnUFzvBRLBIqEgcA0hRnSXmq69jVyZMChgUn5BeVwhjLnx4ucHZ8RPGTZ0F2hXHnC/vesoFSMF5f4gepeTJw==" -pattern:"*.csv.gz" -format:csv -limit:2 -ignoreFirst:true -cr:10.0 -dontWait:true
4848
```
4949
50-
> [!Note]
51-
> * It's recommended to configure `LightIngest` to work with the ingestion endpoint at `https://ingest-{yourClusterNameAndRegion}.kusto.windows.net`. This way the Kusto service can manage the ingestion load, and it provides for recovery in case of transient errors. However, you can also configure `LightIngest` to work directly with the engine endpoint (`https://{yourClusterNameAndRegion}.kusto.windows.net`).
52-
> * Knowing the raw data size is important for optimal ingestion performance. `LightIngest` will estimate the uncompressed size of local files. However, for compressed blobs, `LightIngest` could have difficulties correctly estimating their raw size without first downloading them. When ingesting compressed blobs, it will be helpful for `LightIngest` performance if you set the `rawSizeBytes` property on the blob metadata to uncompressed data size in bytes.
50+
* The recommended method is for `LightIngest` to work with the ingestion endpoint at `https://ingest-{yourClusterNameAndRegion}.kusto.windows.net`. This way, the Azure Data Explorer service can manage the ingestion load, and you can easily recover from transient errors. However, you can also configure `LightIngest` to work directly with the engine endpoint (`https://{yourClusterNameAndRegion}.kusto.windows.net`).
51+
* For optimal ingestion performance, it is important for LightIngest to know the raw data size and so `LightIngest` will estimate the uncompressed size of local files. However, `LightIngest` might not be able to correctly estimate the raw size of compressed blobs without first downloading them. Therefore, when ingesting compressed blobs, set the `rawSizeBytes` property on the blob metadata to uncompressed data size in bytes.
5352
54-
## Command line arguments reference
53+
## General command-line arguments
5554
5655
|Argument name |Short name |Type |Mandatory |Description |
5756
|----------------------|-------------|--------|----------|-------------------------------------------|
58-
| | |string |Mandatory |[Kusto Connection String](https://docs.microsoft.com/azure/kusto/api/connection-strings/kusto) specifying the Kusto endpoint that will handle the ingestion. Should be enclosed in double quotes |
59-
|-database |-db |string |Optional |Target Kusto database name |
60-
|-table | |string |Mandatory |Target Kusto table name |
57+
| | |string |Mandatory |[Azure Data Explorer Connection String](https://docs.microsoft.com/azure/kusto/api/connection-strings/kusto) specifying the Kusto endpoint that will handle the ingestion. Should be enclosed in double quotes |
58+
|-database |-db |string |Optional |Target Azure Data Explorer database name |
59+
|-table | |string |Mandatory |Target Azure Data Explorer table name |
6160
|-sourcePath |-source |string |Mandatory |Path to source files or root URI of the blob container. If the data is in blobs, must contain storage account key or SAS. Recommended to enclose in double quotes |
6261
|-prefix | |string |Optional |When the source data to ingest resides on blob storage, this URL prefix is shared by all blobs, excluding the container name. For example, if the data is in `MyContainer/Dir1/Dir2`, then the prefix should be `Dir1/Dir2`. Enclosing in double quotes is recommended |
6362
|-pattern | |string |Optional |Pattern by which source files/blobs are picked. Supports wildcards. For example, `"*.csv"`. Recommended to enclose in double quotes |
@@ -68,11 +67,11 @@ The utility can pull source data from a local folder or from an Azure blob stora
6867
|-tag | |string |Optional |[Tags](https://docs.microsoft.com/azure/kusto/management/extents-overview#extent-tagging) to associate with the ingested data. Multiple occurrences are permitted |
6968
|-dontWait | |bool |Optional |If set to 'true', does not wait for ingestion completion. Useful when ingesting large amounts of files/blobs |
7069
71-
### Additional arguments for advanced scenarios
70+
### Command-line arguments for advanced scenarios
7271
7372
|Argument name |Short name |Type |Mandatory |Description |
7473
|----------------------|-------------|--------|----------|-------------------------------------------|
75-
|-compression |-cr |double |Optional |Compression ratio hint. Useful when ingesting compressed files/blobs to help Kusto assess the raw data size. Calculated as original size divided by compressed size |
74+
|-compression |-cr |double |Optional |Compression ratio hint. Useful when ingesting compressed files/blobs to help Azure Data Explorer assess the raw data size. Calculated as original size divided by compressed size |
7675
|-limit |-l |integer |Optional |If set, limits the ingestion to first N files |
7776
|-ingestTimeout | |integer |Optional |Timeout in minutes for all ingest operations completion. Defaults to `60`|
7877
|-forceSync | |bool |Optional |If set, forces synchronous ingestion. Defaults to `false` |
@@ -82,34 +81,40 @@ The utility can pull source data from a local folder or from an Azure blob stora
8281
8382
## Usage examples
8483
85-
**Example 1**
84+
**Ingesting a specific number of blobs in JSON format**
8685
87-
* Ingest two blobs under a specified storage account {Account}, files of `CSV` format matching the pattern `.csv.gz`.
88-
* Destination is the database {Database}, the table `Trips`, ignoring the first record
86+
* Ingest two blobs under a specified storage account {Account}, files of `JSON` format matching the pattern `.json`
87+
* Destination is the database {Database}, the table `SampleData`
8988
* Data will be ingested at a compression ratio of 10.0
9089
* LightIngest won't wait for the ingestion to be completed
9190
92-
To use the LightIngest command, below:
93-
1. Create a table command.
94-
1. Create a mapping command.
95-
1. Copy the cluster name and paste it into the LightIngest command {Cluster Name and Region}.
96-
1. Enter the database name into the LightIngest command {Database}.
97-
1. Enter the table name into the LightIngest command.
91+
To use the LightIngest command below:
92+
1. Create a table command and enter the table name into the LightIngest command, replacing `SampleData`.
93+
1. Create a mapping command and enter the IngestionMappingRef command, replacing `SampleData_mapping`.
94+
1. Copy your cluster name and enter it into the LightIngest command, replacing `{Cluster Name and Region}`.
95+
1. Enter the database name into the LightIngest command, replacing `{Database}`.
96+
1. Replace `{Account}` with your account name.
9897
99-
```
100-
LightIngest "Data Source=https://ingest-{Cluster Name and Region}.kusto.windows.net;AAD Federated Security=True"
101-
-db:{Database}
102-
-table:Trips
103-
-source:"https://{Account}.blob.core.windows.net/saadxworkshop1;VXPnUFzvBRLBIqEgcA0hRnSXmq69jVyZMChgUn5BeVwhjLnx4ucHZ8RPGTZ0F2hXHnC/vesoFSMF5f4gepeTJw=="
104-
-pattern:"*.csv.gz"
105-
-format:csv
106-
-limit:2
107-
-ignoreFirst:true
108-
-cr:10.0
109-
-dontWait:true
110-
```
98+
```
99+
LightIngest "Data Source=https://{Cluster name and region}kusto.windows.net;AAD Federated Security=True"
100+
-db:{Database name}
101+
-table:SampleData
102+
-source:"https://{Account}.blob.core.windows.net/data?sp=rl&st=2020-03-17T14:10:02Z&se=2022-12-31T14:10:00Z&sv=2019-02-02&sr=c&sig=QY6%2B1jAjIBQzkPIatkdDlbr%2FggUyq4gklmt%2FcOUM31Y%3D"
103+
-IngestionMappingRef:SampleData_mapping
104+
-pattern:"*.json"
105+
-format:JSON
106+
-limit:2
107+
-cr:10.0
108+
-dontWait:true
109+
```
110+
111+
![Injestion result](media/lightingest/lightingest-cmd-line-result.png)
112+
113+
1. In Azure Data Explorer, open query count.
114+
![Injestion result in Azure Data Explorer](media/lightingest/lightingest-showfailure-count.png)
115+
116+
**Ingesting blobs using a storage account key or a SAS token**
111117
112-
**Example 2**
113118
* Ingest 10 blobs under specified storage account `ACCOUNT`, in folder `DIR`, under container `CONT`, and matching the pattern `*.csv.gz`
114119
* Destination is database `DB`, table `TABLE`, and the ingestion mapping `MAPPING` is precreated on the destination
115120
* The tool will wait until the ingest operations complete
@@ -136,7 +141,8 @@ LightIngest.exe "https://ingest-{clusterAndRegion}.kusto.windows.net;Fed=True;In
136141
-limit:10
137142
```
138143
139-
**Example 3**
144+
**Ingesting all blobs in a container, not including header rows**
145+
140146
* Ingest all blobs under specified storage account `ACCOUNT`, in folder `DIR1/DIR2`, under container `CONT`, and matching the pattern `*.csv.gz`
141147
* Destination is database `DB`, table `TABLE`, and the ingestion mapping `MAPPING` is precreated on the destination
142148
* Source blobs contain header line, so the tool is instructed to drop the first record of each blob
@@ -154,7 +160,8 @@ LightIngest.exe "https://ingest-{clusterAndRegion}.kusto.windows.net;Fed=True"
154160
-ignoreFirstRow:true
155161
```
156162
157-
**Example 4**
163+
**Ingesting all JSON files from a path**
164+
158165
* Ingest all files under path `PATH`, matching the pattern `*.json`
159166
* Destination is database `DB`, table `TABLE`, and the ingestion mapping is defined in local file `MAPPING_FILE_PATH`
160167
* The tool will post the data for ingestion and won't wait for the ingest operations to complete
@@ -169,7 +176,8 @@ LightIngest.exe "https://ingest-{clusterAndRegion}.kusto.windows.net;Fed=True"
169176
-mappingPath:"MAPPING_FILE_PATH"
170177
```
171178
172-
**Example 5**
179+
**Ingesting files and writing diagnostic trace files**
180+
173181
* Ingest all files under path `PATH`, matching the pattern `*.json`
174182
* Destination is database `DB`, table `TABLE`, and the ingestion mapping is defined in local file `MAPPING_FILE_PATH`
175183
* The tool will post the data for ingestion and won't wait for the ingest operations to complete
73 KB
Loading
99.6 KB
Loading

0 commit comments

Comments
 (0)