Skip to content

Commit 967ca2f

Browse files
authored
Merge pull request #109839 from yossi-karp/lightingest
Updated content, added an image
2 parents b700f7c + 862f0d0 commit 967ca2f

File tree

2 files changed

+23
-11
lines changed

2 files changed

+23
-11
lines changed

articles/data-explorer/lightingest.md

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: orspodek
66
ms.reviewer: tzgitlin
77
ms.service: data-explorer
88
ms.topic: conceptual
9-
ms.date: 03/17/2020
9+
ms.date: 04/01/2020
1010
---
1111

1212
# Install and use LightIngest
@@ -17,6 +17,9 @@ The utility can pull source data from a local folder or from an Azure blob stora
1717
## Prerequisites
1818

1919
* LightIngest - download it as part of the [Microsoft.Azure.Kusto.Tools NuGet package](https://www.nuget.org/packages/Microsoft.Azure.Kusto.Tools/)
20+
21+
![Lightingest download](media/lightingest/lightingest-download-area.png)
22+
2023
* WinRAR - download it from [www.win-rar.com/download.html](http://www.win-rar.com/download.html)
2124

2225
## Install LightIngest
@@ -39,16 +42,20 @@ The utility can pull source data from a local folder or from an Azure blob stora
3942
>
4043
>![Command line Help](media/lightingest/lightingest-cmd-line-help.png)
4144
42-
1. Enter `LightIngest` followed by the connection string to the Azure Data Explorer cluster that will manage the ingestion.
45+
1. Enter `ingest-` followed by the connection string to the Azure Data Explorer cluster that will manage the ingestion.
4346
Enclose the connection string in double quotes and follow the [Kusto connection strings specification](https://docs.microsoft.com/azure/kusto/api/connection-strings/kusto).
4447

4548
For example:
4649
```
47-
LightIngest "Data Source=https://{Cluster name and region}.kusto.windows.net;AAD Federated Security=True" -db:{Database} -table:Trips -source:"https://{Account}.blob.core.windows.net/{ROOT_CONTAINER};{StorageAccountKey}" -pattern:"*.csv.gz" -format:csv -limit:2 -ignoreFirst:true -cr:10.0 -dontWait:true
50+
ingest-{Cluster name and region}.kusto.windows.net;AAD Federated Security=True -db:{Database} -table:Trips -source:"https://{Account}.blob.core.windows.net/{ROOT_CONTAINER};{StorageAccountKey}" -pattern:"*.csv.gz" -format:csv -limit:2 -ignoreFirst:true -cr:10.0 -dontWait:true
4851
```
4952
50-
* The recommended method is for `LightIngest` to work with the ingestion endpoint at `https://ingest-{yourClusterNameAndRegion}.kusto.windows.net`. This way, the Azure Data Explorer service can manage the ingestion load, and you can easily recover from transient errors. However, you can also configure `LightIngest` to work directly with the engine endpoint (`https://{yourClusterNameAndRegion}.kusto.windows.net`).
51-
* For optimal ingestion performance, it is important for LightIngest to know the raw data size and so `LightIngest` will estimate the uncompressed size of local files. However, `LightIngest` might not be able to correctly estimate the raw size of compressed blobs without first downloading them. Therefore, when ingesting compressed blobs, set the `rawSizeBytes` property on the blob metadata to uncompressed data size in bytes.
53+
* The recommended method is for LightIngest to work with the ingestion endpoint at `https://ingest-{yourClusterNameAndRegion}.kusto.windows.net`. This way, the Azure Data Explorer service can manage the ingestion load, and you can easily recover from transient errors. However, you can also configure LightIngest to work directly with the engine endpoint (`https://{yourClusterNameAndRegion}.kusto.windows.net`).
54+
55+
> [!Note]
56+
> If you ingest directly with the engine endpoint, you don't need to include `ingest-`, but there won't be a DM feature to protect the engine and improve the ingestion success rate.
57+
58+
* For optimal ingestion performance, it's important for LightIngest to know the raw data size and so LightIngest will estimate the uncompressed size of local files. However, LightIngest might not be able to correctly estimate the raw size of compressed blobs without first downloading them. Therefore, when ingesting compressed blobs, set the `rawSizeBytes` property on the blob metadata to uncompressed data size in bytes.
5259
5360
## General command-line arguments
5461
@@ -67,16 +74,21 @@ The utility can pull source data from a local folder or from an Azure blob stora
6774
|-creationTimePattern | |string |Optional |When set, is used to extract the CreationTime property from the file or blob path. See [Using CreationTimePattern argument](#using-creationtimepattern-argument) |
6875
|-ignoreFirstRow |-ignoreFirst |bool |Optional |If set, the first record of each file/blob is ignored (for example, if the source data has headers) |
6976
|-tag | |string |Optional |[Tags](https://docs.microsoft.com/azure/kusto/management/extents-overview#extent-tagging) to associate with the ingested data. Multiple occurrences are permitted |
70-
|-dontWait | |bool |Optional |If set to 'true', does not wait for ingestion completion. Useful when ingesting large amounts of files/blobs |
77+
|-dontWait | |bool |Optional |If set to 'true', doesn't wait for ingestion completion. Useful when ingesting large amounts of files/blobs |
7178
7279
### Using CreationTimePattern argument
7380
74-
The `-creationTimePattern` argument extracts the CreationTime property from the file or blob path. The pattern does not need to reflect the entire item path, just the section enclosing the timestamp you want to use.
75-
The value of the argument must contain of three sections:
81+
The `-creationTimePattern` argument extracts the CreationTime property from the file or blob path. The pattern doesn't need to reflect the entire item path, just the section enclosing the timestamp you want to use.
82+
83+
The argument values must include:
7684
* Constant test immediately preceding the timestamp, enclosed in single quotes
7785
* The timestamp format, in standard [.NET DateTime notation](https://docs.microsoft.com/dotnet/standard/base-types/custom-date-and-time-format-strings)
78-
* Constant text immediately following the timestamp
79-
For example, if blob names end with 'historicalvalues19840101.parquet' (the timestamp is four digits for the year, two digits for the month and two digits for the day of month), the corresponding value for the `-creationTimePattern` argument is 'historicalvalues'yyyyMMdd'.parquet'.
86+
* Constant text immediately following the timestamp. For example, if blob names end with `historicalvalues19840101.parquet` (the timestamp is four digits for the year, two digits for the month, and two digits for the day of month), the corresponding value for the `-creationTimePattern` argument is:
87+
88+
```
89+
ingest-{Cluster name and region}.kusto.windows.net;AAD Federated Security=True -db:{Database} -table:Trips -source:"https://{Account}.blob.core.windows.net/{ROOT_CONTAINER};{StorageAccountKey}" -creationTimePattern:"'historicalvalues'yyyyMMdd'.parquet'"
90+
-pattern:"*.csv.gz" -format:csv -limit:2 -ignoreFirst:true -cr:10.0 -dontWait:true
91+
```
8092
8193
### Command-line arguments for advanced scenarios
8294
@@ -92,7 +104,7 @@ For example, if blob names end with 'historicalvalues19840101.parquet' (the time
92104
|-devTracing |-trace |string |Optional |If set, diagnostic logs are written to a local directory (by default, `RollingLogs` in the current directory, or can be modified by setting the switch value) |
93105
94106
## Blob metadata properties
95-
When used with Azure blobs, `LightIngest` will use certain blob metadata properties to augment the ingestion process.
107+
When used with Azure blobs, LightIngest will use certain blob metadata properties to augment the ingestion process.
96108
97109
|Metadata property | Usage |
98110
|---------------------------------------------|---------------------------------------------------------------------------------|
30.7 KB
Loading

0 commit comments

Comments
 (0)