|
| 1 | +# LightIngest |
| 2 | + |
| 3 | +LightIngest is a command-line utility for ad-hoc data ingestion into Kusto. |
| 4 | +The utility can pull source data from a local folder or from an Azure blob storage container. |
| 5 | + |
| 6 | +## Prerequisites |
| 7 | + |
| 8 | +* LightIngest - download it as part of the [Microsoft.Azure.Kusto.Tools NuGet package](https://www.nuget.org/packages/Microsoft.Azure.Kusto.Tools/) |
| 9 | +* WinRAR - download it from [www.win-rar.com/download.html](www.win-rar.com/download.html) |
| 10 | + |
| 11 | +## Install LightIngest |
| 12 | + |
| 13 | +1. Navigate to the location on your computer where you downloaded LightIngest. |
| 14 | +1. Using WinRAR, extract the *tools* directory to your computer. |
| 15 | + |
| 16 | +## Run LightIngest |
| 17 | + |
| 18 | +1. Navigate to the extracted *tools* directory on your computer. |
| 19 | +1. Delete the existing location information from the location bar. |
| 20 | + |
| 21 | +  |
| 22 | + |
| 23 | +1. Enter `cmd` and press **Enter**. |
| 24 | +1. At the command prompt, enter `LightIngest.exe` followed by the relevant command-line argument. |
| 25 | + |
| 26 | + > [!Tip] |
| 27 | + > For a list of supported command-line arguments, enter `LightIngest.exe /help`. |
| 28 | + > |
| 29 | + > |
| 30 | +
|
| 31 | +1. (Mandatory) Enter `LightIngest` followed by the connection string to the Kusto cluster that will manage the ingestion. |
| 32 | + The connection string should be enclosed in double quotes and follow the [Kusto connection strings specification](../api/connection-strings/kusto.md). |
| 33 | + |
| 34 | + For example: |
| 35 | + ``` |
| 36 | + LightIngest "Data Source=https://ingest-tzgitlin.westus.kusto.windows.net;AAD Federated Security=True" -db:TzviaTest -table:Trips -source:"https://tzgitlinegdemo2.blob.core.windows.net/saadxworkshop1;VXPnUFzvBRLBIqEgcA0hRnSXmq69jVyZMChgUn5BeVwhjLnx4ucHZ8RPGTZ0F2hXHnC/vesoFSMF5f4gepeTJw==" -pattern:"*.csv.gz" -format:csv -limit:2 -ignoreFirst:true -cr:10.0 -dontWait:true |
| 37 | + ``` |
| 38 | +
|
| 39 | +> [!Note] |
| 40 | +> * It's recommended to configure `LightIngest` to work with the ingestion endpoint at `https://ingest-{yourClusterNameAndRegion}.kusto.windows.net`. This way the Kusto service can manage the ingestion load, and it provides for recovery in case of transient errors. However, you can also configure `LightIngest` to work directly with the engine endpoint (`https://{yourClusterNameAndRegion}.kusto.windows.net`). |
| 41 | +> * Knowing the raw data size is important for optimal ingestion performance. `LightIngest` will estimate the uncompressed size of local files. However, for compressed blobs, `LightIngest` could have difficulties correctly estimating their raw size without first downloading them. When ingesting compressed blobs, it will be helpful for `LightIngest` performance if you set the `rawSizeBytes` property on the blob metadata to uncompressed data size in bytes. |
| 42 | +
|
| 43 | +## Command line arguments reference |
| 44 | +
|
| 45 | +|Argument name |Short name |Type |Mandatory |Description | |
| 46 | +|----------------------|-------------|--------|----------|-------------------------------------------| |
| 47 | +| | |string |Mandatory |[Kusto Connection String](../api/connection-strings/kusto.md) specifying the Kusto endpoint that will handle the ingestion. Should be enclosed in double quotes | |
| 48 | +|-database |-db |string |Optional |Target Kusto database name | |
| 49 | +|-table | |string |Mandatory |Target Kusto table name | |
| 50 | +|-sourcePath |-source |string |Mandatory |Path to source files or root URI of the blob container. If the data is in blobs, must contain storage account key or SAS. Recommended to enclose in double quotes | |
| 51 | +|-prefix | |string |Optional |When the source data to ingest resides on blob storage, this URL prefix is shared by all blobs, excluding the container name. For example, if the data is in `MyContainer/Dir1/Dir2`, then the prefix should be `Dir1/Dir2`. Enclosing in double quotes is recommended | |
| 52 | +|-pattern | |string |Optional |Pattern by which source files/blobs are picked. Supports wildcards. For example, `"*.csv"`. Recommended to enclose in double quotes | |
| 53 | +|-format |-f |string |Optional |Source data format. Must be one of the [supported formats](../management/data-ingestion/index.md#supported-data-formats) | |
| 54 | +|-ingestionMappingPath |-mappingPath |string |Optional |Path to ingestion column-mapping file (mandatory for Json and Avro formats). See [data mappings](../management/mappings.md) | |
| 55 | +|-ingestionMappingRef |-mappingRef |string |Optional |Name of a pre-created ingestion column mapping (mandatory for Json and Avro formats). See [data mappings](../management/mappings.md) | |
| 56 | +|-ignoreFirstRow |-ignoreFirst |bool |Optional |If set, the first record of each file/blob is ignored (for example, if the source data has headers) | |
| 57 | +|-tag | |string |Optional |[Tags](../management/extents-overview.md#extent-tagging) to associate with the ingested data. Multiple occurrences are permitted | |
| 58 | +|-dontWait | |bool |Optional |If set to 'true', does not wait for ingestion completion. Useful when ingesting large amounts of files/blobs | |
| 59 | +
|
| 60 | +### Additional arguments for advanced scenarios |
| 61 | +
|
| 62 | +|Argument name |Short name |Type |Mandatory |Description | |
| 63 | +|----------------------|-------------|--------|----------|-------------------------------------------| |
| 64 | +|-compression |-cr |double |Optional |Compression ratio hint. Useful when ingesting compressed files/blobs to help Kusto assess the raw data size. Calculated as original size divided by compressed size | |
| 65 | +|-limit |-l |integer |Optional |If set, limits the ingestion to first N files | |
| 66 | +|-ingestTimeout | |integer |Optional |Timeout in minutes for all ingest operations completion. Defaults to `60`| |
| 67 | +|-forceSync | |bool |Optional |If set, forces synchronous ingestion. Defaults to `false` | |
| 68 | +|-dataBatchSize | |integer |Optional |Sets the total size limit (MB, uncompressed) of each ingest operation | |
| 69 | +|-filesInBatch | |integer |Optional |Sets the file/blob count limit of each ingest operation | |
| 70 | +|-devTracing |-trace |string |Optional |If set, diagnostic logs are written to a local directory (by default, `RollingLogs` in the current directory, or can be modified by setting the switch value) | |
| 71 | +
|
| 72 | +## Usage examples |
| 73 | +
|
| 74 | +**Example 1** |
| 75 | +
|
| 76 | +* Ingest two blobs under a specified storage account {Account}, files of `CSV` format matching the pattern `.csv.gz`. |
| 77 | +* Destination is the database {Database}, the table `Trips`, ignoring the first record |
| 78 | +* Data will be ingested at a compression ratio of 10.0 |
| 79 | +* LightIngest won't wait for the ingestion to be completed |
| 80 | +
|
| 81 | +To use the LightIngest command, below: |
| 82 | +1. Create a table command. |
| 83 | +1. Create a mapping command. |
| 84 | +1. Copy the cluster name and paste it into the LightIngest command {Cluster Name and Region}. |
| 85 | +1. Enter the database name into the LightIngest command {Database}. |
| 86 | +1. Enter the table name into the LightIngest command. |
| 87 | +
|
| 88 | +``` |
| 89 | +LightIngest "Data Source=https://ingest-{Cluster Name and Region}.kusto.windows.net;AAD Federated Security=True" |
| 90 | + -db:{Database} |
| 91 | + -table:Trips |
| 92 | + -source:"https://{Account}.blob.core.windows.net/saadxworkshop1;VXPnUFzvBRLBIqEgcA0hRnSXmq69jVyZMChgUn5BeVwhjLnx4ucHZ8RPGTZ0F2hXHnC/vesoFSMF5f4gepeTJw==" |
| 93 | + -pattern:"*.csv.gz" |
| 94 | + -format:csv |
| 95 | + -limit:2 |
| 96 | + -ignoreFirst:true |
| 97 | + -cr:10.0 |
| 98 | + -dontWait:true |
| 99 | +``` |
| 100 | +
|
| 101 | +**Example 2** |
| 102 | +* Ingest 10 blobs under specified storage account `ACCOUNT`, in folder `DIR`, under container `CONT`, and matching the pattern `*.csv.gz` |
| 103 | +* Destination is database `DB`, table `TABLE`, and the ingestion mapping `MAPPING` is precreated on the destination |
| 104 | +* The tool will wait until the ingest operations complete |
| 105 | +* Note the different options for specifying the target database and storage account key vs. SAS token |
| 106 | +
|
| 107 | +``` |
| 108 | +LightIngest.exe "https://ingest-{clusterAndRegion}.kusto.windows.net;Fed=True" |
| 109 | + -database:DB |
| 110 | + -table:TABLE |
| 111 | + -source:"https://ACCOUNT.blob.core.windows.net/{ROOT_CONTAINER};{StorageAccountKey}" |
| 112 | + -prefix:"DIR" |
| 113 | + -pattern:*.csv.gz |
| 114 | + -format:csv |
| 115 | + -mappingRef:MAPPING |
| 116 | + -limit:10 |
| 117 | + |
| 118 | +LightIngest.exe "https://ingest-{clusterAndRegion}.kusto.windows.net;Fed=True;Initial Catalog=DB" |
| 119 | + -table:TABLE |
| 120 | + -source:"https://ACCOUNT.blob.core.windows.net/{ROOT_CONTAINER}?{SAS token}" |
| 121 | + -prefix:"DIR" |
| 122 | + -pattern:*.csv.gz |
| 123 | + -format:csv |
| 124 | + -mappingRef:MAPPING |
| 125 | + -limit:10 |
| 126 | +``` |
| 127 | +
|
| 128 | +**Example 3** |
| 129 | +* Ingest all blobs under specified storage account `ACCOUNT`, in folder `DIR1/DIR2`, under container `CONT`, and matching the pattern `*.csv.gz` |
| 130 | +* Destination is database `DB`, table `TABLE`, and the ingestion mapping `MAPPING` is precreated on the destination |
| 131 | +* Source blobs contain header line, so the tool is instructed to drop the first record of each blob |
| 132 | +* The tool will post the data for ingestion and won't wait for the ingest operations to complete |
| 133 | +
|
| 134 | +``` |
| 135 | +LightIngest.exe "https://ingest-{clusterAndRegion}.kusto.windows.net;Fed=True" |
| 136 | + -database:DB |
| 137 | + -table:TABLE |
| 138 | + -source:"https://ACCOUNT.blob.core.windows.net/{ROOT_CONTAINER}?{SAS token}" |
| 139 | + -prefix:"DIR1/DIR2" |
| 140 | + -pattern:*.csv.gz |
| 141 | + -format:csv |
| 142 | + -mappingRef:MAPPING |
| 143 | + -ignoreFirstRow:true |
| 144 | +``` |
| 145 | +
|
| 146 | +**Example 4** |
| 147 | +* Ingest all files under path `PATH`, matching the pattern `*.json` |
| 148 | +* Destination is database `DB`, table `TABLE`, and the ingestion mapping is defined in local file `MAPPING_FILE_PATH` |
| 149 | +* The tool will post the data for ingestion and won't wait for the ingest operations to complete |
| 150 | +
|
| 151 | +``` |
| 152 | +LightIngest.exe "https://ingest-{clusterAndRegion}.kusto.windows.net;Fed=True" |
| 153 | + -database:DB |
| 154 | + -table:TABLE |
| 155 | + -source:"PATH" |
| 156 | + -pattern:*.json |
| 157 | + -format:json |
| 158 | + -mappingPath:"MAPPING_FILE_PATH" |
| 159 | +``` |
| 160 | +
|
| 161 | +**Example 5** |
| 162 | +* Ingest all files under path `PATH`, matching the pattern `*.json` |
| 163 | +* Destination is database `DB`, table `TABLE`, and the ingestion mapping is defined in local file `MAPPING_FILE_PATH` |
| 164 | +* The tool will post the data for ingestion and won't wait for the ingest operations to complete |
| 165 | +* Diagnostics trace files will be written locally under folder `LOGS_PATH` |
| 166 | +
|
| 167 | +``` |
| 168 | +LightIngest.exe "https://ingest-{clusterAndRegion}.kusto.windows.net;Fed=True" |
| 169 | + -database:DB |
| 170 | + -table:TABLE |
| 171 | + -source:"PATH" |
| 172 | + -pattern:*.json |
| 173 | + -format:json |
| 174 | + -mappingPath:"MAPPING_FILE_PATH" |
| 175 | + -trace:"LOGS_PATH" |
| 176 | +``` |
0 commit comments