|
| 1 | +--- |
| 2 | +title: LightIngest is a command-line utility for ingestion into Azure Data Explorer. |
| 3 | +description: Learn about LightIngest, a command-line utility for ad-hoc data ingestion into Azure Data Explorer. |
| 4 | +author: orspod |
| 5 | +ms.author: orspodek |
| 6 | +ms.reviewer: tzgitlin |
| 7 | +ms.service: data-explorer |
| 8 | +ms.topic: conceptual |
| 9 | +ms.date: 03/17/2020 |
| 10 | +--- |
| 11 | + |
| 12 | +# Install and use LightIngest |
| 13 | + |
| 14 | +LightIngest is a command-line utility for ad-hoc data ingestion into Azure Data Explorer. |
| 15 | +The utility can pull source data from a local folder or from an Azure blob storage container. |
| 16 | + |
| 17 | +## Prerequisites |
| 18 | + |
| 19 | +* LightIngest - download it as part of the [Microsoft.Azure.Kusto.Tools NuGet package](https://www.nuget.org/packages/Microsoft.Azure.Kusto.Tools/) |
| 20 | +* WinRAR - download it from [www.win-rar.com/download.html](http://www.win-rar.com/download.html) |
| 21 | + |
| 22 | +## Install LightIngest |
| 23 | + |
| 24 | +1. Navigate to the location on your computer where you downloaded LightIngest. |
| 25 | +1. Using WinRAR, extract the *tools* directory to your computer. |
| 26 | + |
| 27 | +## Run LightIngest |
| 28 | + |
| 29 | +1. Navigate to the extracted *tools* directory on your computer. |
| 30 | +1. Delete the existing location information from the location bar. |
| 31 | + |
| 32 | +  |
| 33 | + |
| 34 | +1. Enter `cmd` and press **Enter**. |
| 35 | +1. At the command prompt, enter `LightIngest.exe` followed by the relevant command-line argument. |
| 36 | + |
| 37 | + > [!Tip] |
| 38 | + > For a list of supported command-line arguments, enter `LightIngest.exe /help`. |
| 39 | + > |
| 40 | + > |
| 41 | +
|
| 42 | +1. Enter `LightIngest` followed by the connection string to the Azure Data Explorer cluster that will manage the ingestion. |
| 43 | + Enclose the connection string in double quotes and follow the [Kusto connection strings specification](https://docs.microsoft.com/azure/kusto/api/connection-strings/kusto). |
| 44 | + |
| 45 | + For example: |
| 46 | + ``` |
| 47 | + LightIngest "Data Source=https://{Cluster name and region}.kusto.windows.net;AAD Federated Security=True" -db:{Database} -table:Trips -source:"https://{Account}.blob.core.windows.net/{ROOT_CONTAINER};{StorageAccountKey}" -pattern:"*.csv.gz" -format:csv -limit:2 -ignoreFirst:true -cr:10.0 -dontWait:true |
| 48 | + ``` |
| 49 | +
|
| 50 | +* The recommended method is for `LightIngest` to work with the ingestion endpoint at `https://ingest-{yourClusterNameAndRegion}.kusto.windows.net`. This way, the Azure Data Explorer service can manage the ingestion load, and you can easily recover from transient errors. However, you can also configure `LightIngest` to work directly with the engine endpoint (`https://{yourClusterNameAndRegion}.kusto.windows.net`). |
| 51 | +* For optimal ingestion performance, it is important for LightIngest to know the raw data size and so `LightIngest` will estimate the uncompressed size of local files. However, `LightIngest` might not be able to correctly estimate the raw size of compressed blobs without first downloading them. Therefore, when ingesting compressed blobs, set the `rawSizeBytes` property on the blob metadata to uncompressed data size in bytes. |
| 52 | +
|
| 53 | +## General command-line arguments |
| 54 | +
|
| 55 | +|Argument name |Short name |Type |Mandatory |Description | |
| 56 | +|----------------------|-------------|--------|----------|-------------------------------------------| |
| 57 | +| | |string |Mandatory |[Azure Data Explorer Connection String](https://docs.microsoft.com/azure/kusto/api/connection-strings/kusto) specifying the Kusto endpoint that will handle the ingestion. Should be enclosed in double quotes | |
| 58 | +|-database |-db |string |Optional |Target Azure Data Explorer database name | |
| 59 | +|-table | |string |Mandatory |Target Azure Data Explorer table name | |
| 60 | +|-sourcePath |-source |string |Mandatory |Path to source files or root URI of the blob container. If the data is in blobs, must contain storage account key or SAS. Recommended to enclose in double quotes | |
| 61 | +|-prefix | |string |Optional |When the source data to ingest resides on blob storage, this URL prefix is shared by all blobs, excluding the container name. <br>For example, if the data is in `MyContainer/Dir1/Dir2`, then the prefix should be `Dir1/Dir2`. Enclosing in double quotes is recommended | |
| 62 | +|-pattern | |string |Optional |Pattern by which source files/blobs are picked. Supports wildcards. For example, `"*.csv"`. Recommended to enclose in double quotes | |
| 63 | +|-zipPattern | |string |Optional |Regular expression to use when selecting which files in a ZIP archive to ingest.<br>All other files in the archive will be ignored.For example, `"*.csv"`. It's recommended to surround it in double quotes | |
| 64 | +|-format |-f |string |Optional |Source data format. Must be one of the [supported formats](https://docs.microsoft.com/azure/kusto/management/data-ingestion/#supported-data-formats) | |
| 65 | +|-ingestionMappingPath |-mappingPath |string |Optional |Path to ingestion column-mapping file (mandatory for Json and Avro formats). See [data mappings](https://docs.microsoft.com/azure/kusto/management/mappings) | |
| 66 | +|-ingestionMappingRef |-mappingRef |string |Optional |Name of a pre-created ingestion column mapping (mandatory for Json and Avro formats). See [data mappings](https://docs.microsoft.com/azure/kusto/management/mappings) | |
| 67 | +|-creationTimePattern | |string |Optional |When set, is used to extract the CreationTime property from the file or blob path. See [Using CreationTimePattern argument](#using-creationtimepattern-argument) | |
| 68 | +|-ignoreFirstRow |-ignoreFirst |bool |Optional |If set, the first record of each file/blob is ignored (for example, if the source data has headers) | |
| 69 | +|-tag | |string |Optional |[Tags](https://docs.microsoft.com/azure/kusto/management/extents-overview#extent-tagging) to associate with the ingested data. Multiple occurrences are permitted | |
| 70 | +|-dontWait | |bool |Optional |If set to 'true', does not wait for ingestion completion. Useful when ingesting large amounts of files/blobs | |
| 71 | +
|
| 72 | +### Using CreationTimePattern argument |
| 73 | +
|
| 74 | +The `-creationTimePattern` argument extracts the CreationTime property from the file or blob path. The pattern does not need to reflect the entire item path, just the section enclosing the timestamp you want to use. |
| 75 | +The value of the argument must contain of three sections: |
| 76 | +* Constant test immediately preceding the timestamp, enclosed in single quotes |
| 77 | +* The timestamp format, in standard [.NET DateTime notation](https://docs.microsoft.com/dotnet/standard/base-types/custom-date-and-time-format-strings) |
| 78 | +* Constant text immediately following the timestamp |
| 79 | +For example, if blob names end with 'historicalvalues19840101.parquet' (the timestamp is four digits for the year, two digits for the month and two digits for the day of month), the corresponding value for the `-creationTimePattern` argument is 'historicalvalues'yyyyMMdd'.parquet'. |
| 80 | +
|
| 81 | +### Command-line arguments for advanced scenarios |
| 82 | +
|
| 83 | +|Argument name |Short name |Type |Mandatory |Description | |
| 84 | +|----------------------|-------------|--------|----------|-------------------------------------------| |
| 85 | +|-compression |-cr |double |Optional |Compression ratio hint. Useful when ingesting compressed files/blobs to help Azure Data Explorer assess the raw data size. Calculated as original size divided by compressed size | |
| 86 | +|-limit |-l |integer |Optional |If set, limits the ingestion to first N files | |
| 87 | +|-listOnly |-list |bool |Optional |If set, only displays the items that would have been selected for ingestion| |
| 88 | +|-ingestTimeout | |integer |Optional |Timeout in minutes for all ingest operations completion. Defaults to `60`| |
| 89 | +|-forceSync | |bool |Optional |If set, forces synchronous ingestion. Defaults to `false` | |
| 90 | +|-dataBatchSize | |integer |Optional |Sets the total size limit (MB, uncompressed) of each ingest operation | |
| 91 | +|-filesInBatch | |integer |Optional |Sets the file/blob count limit of each ingest operation | |
| 92 | +|-devTracing |-trace |string |Optional |If set, diagnostic logs are written to a local directory (by default, `RollingLogs` in the current directory, or can be modified by setting the switch value) | |
| 93 | +
|
| 94 | +## Blob metadata properties |
| 95 | +When used with Azure blobs, `LightIngest` will use certain blob metadata properties to augment the ingestion process. |
| 96 | +
|
| 97 | +|Metadata property | Usage | |
| 98 | +|---------------------------------------------|---------------------------------------------------------------------------------| |
| 99 | +|`rawSizeBytes`, `kustoUncompressedSizeBytes` | If set, will be interpreted as the uncompressed data size | |
| 100 | +|`kustoCreationTime`, `kustoCreationTimeUtc` | Interpreted as UTC timestamp. If set, will be used to override the creation time in Kusto. Useful for backfilling scenarios | |
| 101 | +
|
| 102 | +## Usage examples |
| 103 | +
|
| 104 | +<!-- Waiting for Tzvia or Vladik to rewrite the instructions for this example before publishing it |
| 105 | +
|
| 106 | +### Ingesting a specific number of blobs in JSON format |
| 107 | +
|
| 108 | +* Ingest two blobs under a specified storage account {Account}, in `JSON` format matching the pattern `.json` |
| 109 | +* Destination is the database {Database}, the table `SampleData` |
| 110 | +* Indicate that your data is compressed with the approximate ratio of 10.0 |
| 111 | +* LightIngest won't wait for the ingestion to be completed |
| 112 | +
|
| 113 | +To use the LightIngest command below: |
| 114 | +1. Create a table command and enter the table name into the LightIngest command, replacing `SampleData`. |
| 115 | +1. Create a mapping command and enter the IngestionMappingRef command, replacing `SampleData_mapping`. |
| 116 | +1. Copy your cluster name and enter it into the LightIngest command, replacing `{ClusterandRegion}`. |
| 117 | +1. Enter the database name into the LightIngest command, replacing `{Database name}`. |
| 118 | +1. Replace `{Account}` with your account name and replace `{ROOT_CONTAINER}?{SAS token}` with the appropriate information. |
| 119 | +
|
| 120 | + ``` |
| 121 | + LightIngest.exe "https://ingest-{ClusterAndRegion}.kusto.windows.net;Fed=True" |
| 122 | + -db:{Database name} |
| 123 | + -table:SampleData |
| 124 | + -source:"https://{Account}.blob.core.windows.net/{ROOT_CONTAINER}?{SAS token}" |
| 125 | + -IngestionMappingRef:SampleData_mapping |
| 126 | + -pattern:"*.json" |
| 127 | + -format:JSON |
| 128 | + -limit:2 |
| 129 | + -cr:10.0 |
| 130 | + -dontWait:true |
| 131 | + ``` |
| 132 | + |
| 133 | +1. In Azure Data Explorer, open query count. |
| 134 | +
|
| 135 | +  |
| 136 | +--> |
| 137 | +
|
| 138 | +### Ingesting blobs using a storage account key or a SAS token |
| 139 | +
|
| 140 | +* Ingest 10 blobs under specified storage account `ACCOUNT`, in folder `DIR`, under container `CONT`, and matching the pattern `*.csv.gz` |
| 141 | +* Destination is database `DB`, table `TABLE`, and the ingestion mapping `MAPPING` is precreated on the destination |
| 142 | +* The tool will wait until the ingest operations complete |
| 143 | +* Note the different options for specifying the target database and storage account key vs. SAS token |
| 144 | +
|
| 145 | +``` |
| 146 | +LightIngest.exe "https://ingest-{ClusterAndRegion}.kusto.windows.net;Fed=True" |
| 147 | + -database:DB |
| 148 | + -table:TABLE |
| 149 | + -source:"https://ACCOUNT.blob.core.windows.net/{ROOT_CONTAINER};{StorageAccountKey}" |
| 150 | + -prefix:"DIR" |
| 151 | + -pattern:*.csv.gz |
| 152 | + -format:csv |
| 153 | + -mappingRef:MAPPING |
| 154 | + -limit:10 |
| 155 | + |
| 156 | +LightIngest.exe "https://ingest-{ClusterAndRegion}.kusto.windows.net;Fed=True;Initial Catalog=DB" |
| 157 | + -table:TABLE |
| 158 | + -source:"https://ACCOUNT.blob.core.windows.net/{ROOT_CONTAINER}?{SAS token}" |
| 159 | + -prefix:"DIR" |
| 160 | + -pattern:*.csv.gz |
| 161 | + -format:csv |
| 162 | + -mappingRef:MAPPING |
| 163 | + -limit:10 |
| 164 | +``` |
| 165 | +
|
| 166 | +### Ingesting all blobs in a container, not including header rows |
| 167 | +
|
| 168 | +* Ingest all blobs under specified storage account `ACCOUNT`, in folder `DIR1/DIR2`, under container `CONT`, and matching the pattern `*.csv.gz` |
| 169 | +* Destination is database `DB`, table `TABLE`, and the ingestion mapping `MAPPING` is precreated on the destination |
| 170 | +* Source blobs contain header line, so the tool is instructed to drop the first record of each blob |
| 171 | +* The tool will post the data for ingestion and won't wait for the ingest operations to complete |
| 172 | +
|
| 173 | +``` |
| 174 | +LightIngest.exe "https://ingest-{ClusterAndRegion}.kusto.windows.net;Fed=True" |
| 175 | + -database:DB |
| 176 | + -table:TABLE |
| 177 | + -source:"https://ACCOUNT.blob.core.windows.net/{ROOT_CONTAINER}?{SAS token}" |
| 178 | + -prefix:"DIR1/DIR2" |
| 179 | + -pattern:*.csv.gz |
| 180 | + -format:csv |
| 181 | + -mappingRef:MAPPING |
| 182 | + -ignoreFirstRow:true |
| 183 | +``` |
| 184 | +
|
| 185 | +### Ingesting all JSON files from a path |
| 186 | +
|
| 187 | +* Ingest all files under path `PATH`, matching the pattern `*.json` |
| 188 | +* Destination is database `DB`, table `TABLE`, and the ingestion mapping is defined in local file `MAPPING_FILE_PATH` |
| 189 | +* The tool will post the data for ingestion and won't wait for the ingest operations to complete |
| 190 | +
|
| 191 | +``` |
| 192 | +LightIngest.exe "https://ingest-{ClusterAndRegion}.kusto.windows.net;Fed=True" |
| 193 | + -database:DB |
| 194 | + -table:TABLE |
| 195 | + -source:"PATH" |
| 196 | + -pattern:*.json |
| 197 | + -format:json |
| 198 | + -mappingPath:"MAPPING_FILE_PATH" |
| 199 | +``` |
| 200 | +
|
| 201 | +### Ingesting files and writing diagnostic trace files |
| 202 | +
|
| 203 | +* Ingest all files under path `PATH`, matching the pattern `*.json` |
| 204 | +* Destination is database `DB`, table `TABLE`, and the ingestion mapping is defined in local file `MAPPING_FILE_PATH` |
| 205 | +* The tool will post the data for ingestion and won't wait for the ingest operations to complete |
| 206 | +* Diagnostics trace files will be written locally under folder `LOGS_PATH` |
| 207 | +
|
| 208 | +``` |
| 209 | +LightIngest.exe "https://ingest-{ClusterAndRegion}.kusto.windows.net;Fed=True" |
| 210 | + -database:DB |
| 211 | + -table:TABLE |
| 212 | + -source:"PATH" |
| 213 | + -pattern:*.json |
| 214 | + -format:json |
| 215 | + -mappingPath:"MAPPING_FILE_PATH" |
| 216 | + -trace:"LOGS_PATH" |
| 217 | +``` |
| 218 | +## Changelog |
| 219 | +|Version |Changes | |
| 220 | +|---------------|------------------------------------------------------------------------------------| |
| 221 | +|4.0.9.0 |<ul><li>Added `-zipPattern` argument</li><li>Added `-listOnly` argument</li><li>Arguments summary is displayed before run is commenced</li><li>CreationTime is read from blob metadata properties or from blob or file name, according to the `-creationTimePattern` argument</li></ul>| |
0 commit comments