Skip to content

Commit 2173e6e

Browse files
committed
First draft
1 parent 8654721 commit 2173e6e

File tree

3 files changed

+176
-0
lines changed

3 files changed

+176
-0
lines changed

articles/data-explorer/lightingest.md

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
# LightIngest
2+
3+
LightIngest is a command-line utility for ad-hoc data ingestion into Kusto.
4+
The utility can pull source data from a local folder or from an Azure blob storage container.
5+
6+
## Prerequisites
7+
8+
* LightIngest - download it as part of the [Microsoft.Azure.Kusto.Tools NuGet package](https://www.nuget.org/packages/Microsoft.Azure.Kusto.Tools/)
9+
* WinRAR - download it from [www.win-rar.com/download.html](www.win-rar.com/download.html)
10+
11+
## Install LightIngest
12+
13+
1. Navigate to the location on your computer where you downloaded LightIngest.
14+
1. Using WinRAR, extract the *tools* directory to your computer.
15+
16+
## Run LightIngest
17+
18+
1. Navigate to the extracted *tools* directory on your computer.
19+
1. Delete the existing location information from the location bar.
20+
21+
![Delete location information](media/lightingest/lightingest-locationbar.png)
22+
23+
1. Enter `cmd` and press **Enter**.
24+
1. At the command prompt, enter `LightIngest.exe` followed by the relevant command-line argument.
25+
26+
> [!Tip]
27+
> For a list of supported command-line arguments, enter `LightIngest.exe /help`.
28+
>
29+
>![Command line Help](media/lightingest/lightingest-cmd-line-help.png)
30+
31+
1. (Mandatory) Enter `LightIngest` followed by the connection string to the Kusto cluster that will manage the ingestion.
32+
The connection string should be enclosed in double quotes and follow the [Kusto connection strings specification](../api/connection-strings/kusto.md).
33+
34+
For example:
35+
```
36+
LightIngest "Data Source=https://ingest-tzgitlin.westus.kusto.windows.net;AAD Federated Security=True" -db:TzviaTest -table:Trips -source:"https://tzgitlinegdemo2.blob.core.windows.net/saadxworkshop1;VXPnUFzvBRLBIqEgcA0hRnSXmq69jVyZMChgUn5BeVwhjLnx4ucHZ8RPGTZ0F2hXHnC/vesoFSMF5f4gepeTJw==" -pattern:"*.csv.gz" -format:csv -limit:2 -ignoreFirst:true -cr:10.0 -dontWait:true
37+
```
38+
39+
> [!Note]
40+
> * It's recommended to configure `LightIngest` to work with the ingestion endpoint at `https://ingest-{yourClusterNameAndRegion}.kusto.windows.net`. This way the Kusto service can manage the ingestion load, and it provides for recovery in case of transient errors. However, you can also configure `LightIngest` to work directly with the engine endpoint (`https://{yourClusterNameAndRegion}.kusto.windows.net`).
41+
> * Knowing the raw data size is important for optimal ingestion performance. `LightIngest` will estimate the uncompressed size of local files. However, for compressed blobs, `LightIngest` could have difficulties correctly estimating their raw size without first downloading them. When ingesting compressed blobs, it will be helpful for `LightIngest` performance if you set the `rawSizeBytes` property on the blob metadata to uncompressed data size in bytes.
42+
43+
## Command line arguments reference
44+
45+
|Argument name |Short name |Type |Mandatory |Description |
46+
|----------------------|-------------|--------|----------|-------------------------------------------|
47+
| | |string |Mandatory |[Kusto Connection String](../api/connection-strings/kusto.md) specifying the Kusto endpoint that will handle the ingestion. Should be enclosed in double quotes |
48+
|-database |-db |string |Optional |Target Kusto database name |
49+
|-table | |string |Mandatory |Target Kusto table name |
50+
|-sourcePath |-source |string |Mandatory |Path to source files or root URI of the blob container. If the data is in blobs, must contain storage account key or SAS. Recommended to enclose in double quotes |
51+
|-prefix | |string |Optional |When the source data to ingest resides on blob storage, this URL prefix is shared by all blobs, excluding the container name. For example, if the data is in `MyContainer/Dir1/Dir2`, then the prefix should be `Dir1/Dir2`. Enclosing in double quotes is recommended |
52+
|-pattern | |string |Optional |Pattern by which source files/blobs are picked. Supports wildcards. For example, `"*.csv"`. Recommended to enclose in double quotes |
53+
|-format |-f |string |Optional |Source data format. Must be one of the [supported formats](../management/data-ingestion/index.md#supported-data-formats) |
54+
|-ingestionMappingPath |-mappingPath |string |Optional |Path to ingestion column-mapping file (mandatory for Json and Avro formats). See [data mappings](../management/mappings.md) |
55+
|-ingestionMappingRef |-mappingRef |string |Optional |Name of a pre-created ingestion column mapping (mandatory for Json and Avro formats). See [data mappings](../management/mappings.md) |
56+
|-ignoreFirstRow |-ignoreFirst |bool |Optional |If set, the first record of each file/blob is ignored (for example, if the source data has headers) |
57+
|-tag | |string |Optional |[Tags](../management/extents-overview.md#extent-tagging) to associate with the ingested data. Multiple occurrences are permitted |
58+
|-dontWait | |bool |Optional |If set to 'true', does not wait for ingestion completion. Useful when ingesting large amounts of files/blobs |
59+
60+
### Additional arguments for advanced scenarios
61+
62+
|Argument name |Short name |Type |Mandatory |Description |
63+
|----------------------|-------------|--------|----------|-------------------------------------------|
64+
|-compression |-cr |double |Optional |Compression ratio hint. Useful when ingesting compressed files/blobs to help Kusto assess the raw data size. Calculated as original size divided by compressed size |
65+
|-limit |-l |integer |Optional |If set, limits the ingestion to first N files |
66+
|-ingestTimeout | |integer |Optional |Timeout in minutes for all ingest operations completion. Defaults to `60`|
67+
|-forceSync | |bool |Optional |If set, forces synchronous ingestion. Defaults to `false` |
68+
|-dataBatchSize | |integer |Optional |Sets the total size limit (MB, uncompressed) of each ingest operation |
69+
|-filesInBatch | |integer |Optional |Sets the file/blob count limit of each ingest operation |
70+
|-devTracing |-trace |string |Optional |If set, diagnostic logs are written to a local directory (by default, `RollingLogs` in the current directory, or can be modified by setting the switch value) |
71+
72+
## Usage examples
73+
74+
**Example 1**
75+
76+
* Ingest two blobs under a specified storage account {Account}, files of `CSV` format matching the pattern `.csv.gz`.
77+
* Destination is the database {Database}, the table `Trips`, ignoring the first record
78+
* Data will be ingested at a compression ratio of 10.0
79+
* LightIngest won't wait for the ingestion to be completed
80+
81+
To use the LightIngest command, below:
82+
1. Create a table command.
83+
1. Create a mapping command.
84+
1. Copy the cluster name and paste it into the LightIngest command {Cluster Name and Region}.
85+
1. Enter the database name into the LightIngest command {Database}.
86+
1. Enter the table name into the LightIngest command.
87+
88+
```
89+
LightIngest "Data Source=https://ingest-{Cluster Name and Region}.kusto.windows.net;AAD Federated Security=True"
90+
-db:{Database}
91+
-table:Trips
92+
-source:"https://{Account}.blob.core.windows.net/saadxworkshop1;VXPnUFzvBRLBIqEgcA0hRnSXmq69jVyZMChgUn5BeVwhjLnx4ucHZ8RPGTZ0F2hXHnC/vesoFSMF5f4gepeTJw=="
93+
-pattern:"*.csv.gz"
94+
-format:csv
95+
-limit:2
96+
-ignoreFirst:true
97+
-cr:10.0
98+
-dontWait:true
99+
```
100+
101+
**Example 2**
102+
* Ingest 10 blobs under specified storage account `ACCOUNT`, in folder `DIR`, under container `CONT`, and matching the pattern `*.csv.gz`
103+
* Destination is database `DB`, table `TABLE`, and the ingestion mapping `MAPPING` is precreated on the destination
104+
* The tool will wait until the ingest operations complete
105+
* Note the different options for specifying the target database and storage account key vs. SAS token
106+
107+
```
108+
LightIngest.exe "https://ingest-{clusterAndRegion}.kusto.windows.net;Fed=True"
109+
-database:DB
110+
-table:TABLE
111+
-source:"https://ACCOUNT.blob.core.windows.net/{ROOT_CONTAINER};{StorageAccountKey}"
112+
-prefix:"DIR"
113+
-pattern:*.csv.gz
114+
-format:csv
115+
-mappingRef:MAPPING
116+
-limit:10
117+
118+
LightIngest.exe "https://ingest-{clusterAndRegion}.kusto.windows.net;Fed=True;Initial Catalog=DB"
119+
-table:TABLE
120+
-source:"https://ACCOUNT.blob.core.windows.net/{ROOT_CONTAINER}?{SAS token}"
121+
-prefix:"DIR"
122+
-pattern:*.csv.gz
123+
-format:csv
124+
-mappingRef:MAPPING
125+
-limit:10
126+
```
127+
128+
**Example 3**
129+
* Ingest all blobs under specified storage account `ACCOUNT`, in folder `DIR1/DIR2`, under container `CONT`, and matching the pattern `*.csv.gz`
130+
* Destination is database `DB`, table `TABLE`, and the ingestion mapping `MAPPING` is precreated on the destination
131+
* Source blobs contain header line, so the tool is instructed to drop the first record of each blob
132+
* The tool will post the data for ingestion and won't wait for the ingest operations to complete
133+
134+
```
135+
LightIngest.exe "https://ingest-{clusterAndRegion}.kusto.windows.net;Fed=True"
136+
-database:DB
137+
-table:TABLE
138+
-source:"https://ACCOUNT.blob.core.windows.net/{ROOT_CONTAINER}?{SAS token}"
139+
-prefix:"DIR1/DIR2"
140+
-pattern:*.csv.gz
141+
-format:csv
142+
-mappingRef:MAPPING
143+
-ignoreFirstRow:true
144+
```
145+
146+
**Example 4**
147+
* Ingest all files under path `PATH`, matching the pattern `*.json`
148+
* Destination is database `DB`, table `TABLE`, and the ingestion mapping is defined in local file `MAPPING_FILE_PATH`
149+
* The tool will post the data for ingestion and won't wait for the ingest operations to complete
150+
151+
```
152+
LightIngest.exe "https://ingest-{clusterAndRegion}.kusto.windows.net;Fed=True"
153+
-database:DB
154+
-table:TABLE
155+
-source:"PATH"
156+
-pattern:*.json
157+
-format:json
158+
-mappingPath:"MAPPING_FILE_PATH"
159+
```
160+
161+
**Example 5**
162+
* Ingest all files under path `PATH`, matching the pattern `*.json`
163+
* Destination is database `DB`, table `TABLE`, and the ingestion mapping is defined in local file `MAPPING_FILE_PATH`
164+
* The tool will post the data for ingestion and won't wait for the ingest operations to complete
165+
* Diagnostics trace files will be written locally under folder `LOGS_PATH`
166+
167+
```
168+
LightIngest.exe "https://ingest-{clusterAndRegion}.kusto.windows.net;Fed=True"
169+
-database:DB
170+
-table:TABLE
171+
-source:"PATH"
172+
-pattern:*.json
173+
-format:json
174+
-mappingPath:"MAPPING_FILE_PATH"
175+
-trace:"LOGS_PATH"
176+
```
9.97 KB
Loading
121 KB
Loading

0 commit comments

Comments
 (0)