Skip to content

Commit 3db919e

Browse files
authored
Merge pull request #57842 from orspod/Nov2018
Nov2018
2 parents 779e571 + f946be1 commit 3db919e

File tree

3 files changed

+39
-32
lines changed

3 files changed

+39
-32
lines changed

articles/data-explorer/ingest-data-overview.md

Lines changed: 24 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,9 @@ ms.date: 09/24/2018
1212

1313
# Azure Data Explorer data ingestion
1414

15-
Data ingestion is the process used to load data records from one or more sources to create or update a table in Azure Data Explorer. Once ingested, the data becomes available for query. The diagram below shows the end-to-end flow for working in Azure Data Explorer, including data ingestion **(2)**.
15+
Data ingestion is the process used to load data records from one or more sources to create or update a table in Azure Data Explorer. Once ingested, the data becomes available for query. The diagram below shows the end-to-end flow for working in Azure Data Explorer, including data ingestion.
1616

17-
![Overall data flow](media/ingest-data-overview/overall-data-flow.png)
17+
![Data flow](media/ingest-data-overview/data-flow.png)
1818

1919
The Azure Data Explorer data management service, which is responsible for data ingestion, provides the following functionality:
2020

@@ -30,16 +30,16 @@ The Azure Data Explorer data management service, which is responsible for data i
3030

3131
1. **Commit the data ingest**: Makes the data available for query.
3232

33-
> [!NOTE]
34-
> The effective retention policy of ingested data is derived from the database's retention policy. See [retention policy](https://docs.microsoft.com/azure/kusto/concepts/retentionpolicy) for details. Ingesting data requires **Table ingestor** or **Database ingestor** permissions.
35-
3633
## Ingestion methods
3734

38-
Azure Data Explorer supports several ingestion methods, each with its own target scenarios, advantages, and disadvantages. Azure Data Explorer offers connectors to common services, programmatic ingestion using SDKs, and direct access to the engine for exploration purposes.
35+
Azure Data Explorer supports several ingestion methods, each with its own target scenarios, advantages, and disadvantages. Azure Data Explorer offers pipelines and connectors to common services, programmatic ingestion using SDKs, and direct access to the engine for exploration purposes.
36+
37+
### Ingestion using pipelines
3938

40-
### Ingestion using connectors
39+
Azure Data Explorer currently supports the Event Hub pipeline, which can be managed using the management wizard in the Azure portal. For more information, see [Quickstart: Ingest data from Event Hub into Azure Data Explorer](ingest-data-event-hub.md).
4140

42-
Azure Data Explorer currently supports the Event Hub connector, which can be managed using the management wizard in the Azure portal. For more information, see [Quickstart: Ingest data from Event Hub into Azure Data Explorer](ingest-data-event-hub.md).
41+
### Ingestion using connectors and plugins
42+
Azure Data Explorer currently supports the Logstash plugin. For more information, see [Logstash Output Plugin for Azure Data Explorer](https://github.com/Azure/logstash-output-kusto/blob/master/README.md).
4343

4444
### Programmatic ingestion
4545

@@ -49,21 +49,21 @@ Azure Data Explorer provides SDKs that can be used for query and data ingestion.
4949

5050
Kusto offers client SDK that can be used to ingest and query data with :
5151

52-
* [Python SDK](https://docs.microsoft.com/azure/kusto/api/python/kusto-python-client-library)
52+
* [Python SDK](/azure/kusto/api/python/kusto-python-client-library)
5353

54-
* [.NET SDK](https://docs.microsoft.com/azure/kusto/api/netfx/about-the-sdk)
54+
* [.NET SDK](/azure/kusto/api/netfx/about-the-sdk)
5555

56-
* [Java SDK](https://docs.microsoft.com/azure/kusto/api/java/kusto-java-client-library)
56+
* [Java SDK](/azure/kusto/api/java/kusto-java-client-library)
5757

58-
* [Node SDK]
58+
* [Node SDK](/azure/kusto/api/node/kusto-node-client-library)
5959

60-
* [REST API](https://docs.microsoft.com/azure/kusto/api/netfx/kusto-ingest-client-rest)
60+
* [REST API](/azure/kusto/api/netfx/kusto-ingest-client-rest)
6161

6262
**Programmatic ingestion techniques**:
6363

64-
* Ingesting data through the Azure Data Explorer data management service (high-throughput and reliable ingestion)
64+
* Ingesting data through the Azure Data Explorer data management service (high-throughput and reliable ingestion):
6565

66-
* [**Batch ingestion**](https://docs.microsoft.com/azure/kusto/api/netfx/kusto-ingest-queued-ingest-sample) (provided by SDK): the client uploads the data to Azure Blob storage (designated by the Azure Data Explorer data management service) and posts a notification to an Azure Queue. This is the recommended technique for high-volume, reliable and cheap data ingestion.
66+
* [**Batch ingestion**](/azure/kusto/api/netfx/kusto-ingest-queued-ingest-sample) (provided by SDK): the client uploads the data to Azure Blob storage (designated by the Azure Data Explorer data management service) and posts a notification to an Azure Queue. This is the recommended technique for high-volume, reliable and cheap data ingestion.
6767

6868
* Ingesting data directly into the Azure Data Explorer engine (most appropriate for exploration and prototyping):
6969

@@ -113,16 +113,22 @@ For all ingestion methods other than ingest from query, the data must be formatt
113113
> [!NOTE]
114114
> When data is being ingested, data types are inferred based on the target table columns. If a record is incomplete or a field cannot be parsed as the required data type, the corresponding table columns will be populated with null values.
115115
116-
## Schema Mapping
116+
## Ingestion recommendations and limitations
117+
* The effective retention policy of ingested data is derived from the database's retention policy. See [retention policy](/azure/kusto/concepts/retentionpolicy) for details. Ingesting data requires **Table ingestor** or **Database ingestor** permissions.
118+
* Ingestion supports a maximum file size of 5GB. The recommendation is to ingest files between 100MB and 1GB.
119+
120+
## Schema mapping
117121

118122
Schema mapping helps deterministically bind source data fields to destination table columns.
119123

120-
* [CSV Mapping](https://docs.microsoft.com/azure/kusto/management/mappings?branch=master#csv-mapping) (optional) works with all ordinal-based formats and can be passed as the ingest command parameter or [pre-created on the table](https://docs.microsoft.com/azure/kusto/management/tables?branch=master#create-ingestion-mapping) and referenced from the ingest command parameter.
121-
* [JSON Mapping](https://docs.microsoft.com/azure/kusto/management/mappings?branch=master#json-mapping) (mandatory) and [Avro mapping](https://docs.microsoft.com/azure/kusto/management/mappings?branch=master#avro-mapping) (mandatory) can be passed as the ingest command parameter or [pre-created on the table](https://docs.microsoft.com/azure/kusto/management/tables#create-ingestion-mapping) and referenced from the ingest command parameter.
124+
* [CSV Mapping](/azure/kusto/management/mappings?branch=master#csv-mapping) (optional) works with all ordinal-based formats and can be passed as the ingest command parameter or [pre-created on the table](/azure/kusto/management/tables?branch=master#create-ingestion-mapping) and referenced from the ingest command parameter.
125+
* [JSON Mapping](/azure/kusto/management/mappings?branch=master#json-mapping) (mandatory) and [Avro mapping](/azure/kusto/management/mappings?branch=master#avro-mapping) (mandatory) can be passed as the ingest command parameter or [pre-created on the table](/azure/kusto/management/tables#create-ingestion-mapping) and referenced from the ingest command parameter.
122126

123127
## Next steps
124128

125129
[Quickstart: Ingest data from Event Hub into Azure Data Explorer](ingest-data-event-hub.md)
126130

127131
[Quickstart: Ingest data using the Azure Data Explorer Python library](python-ingest-data.md)
128132

133+
[Quickstart: Ingest data using the Azure Data Explorer Node library](node-ingest-data.md)
134+
204 KB
Loading

articles/data-explorer/time-series-analysis.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -53,10 +53,10 @@ demo_make_series1
5353
| render timechart
5454
```
5555

56-
- Use the [`make-series`](https://docs.microsoft.com/azure/kusto/query/make-seriesoperator) operator to create a set of three time series, where:
56+
- Use the [`make-series`](/azure/kusto/query/make-seriesoperator) operator to create a set of three time series, where:
5757
- `num=count()`: time series of traffic
5858
- `range(min_t, max_t, 1h)`: time series is created in 1-hour bins in the time range (oldest and newest timestamps of table records)
59-
- `default=0`: specify fill method for missing bins to create regular time series. Alternatively use [`series_fill_const()`](https://docs.microsoft.com/azure/kusto/query/series-fill-constfunction), [`series_fill_forward()`](https://docs.microsoft.com/azure/kusto/query/series-fill-forwardfunction), [`series_fill_backward()`](https://docs.microsoft.com/azure/kusto/query/series-fill-backwardfunction) and [`series_fill_linear()`](https://docs.microsoft.com/azure/kusto/query/series-fill-linearfunction) for changes
59+
- `default=0`: specify fill method for missing bins to create regular time series. Alternatively use [`series_fill_const()`](/azure/kusto/query/series-fill-constfunction), [`series_fill_forward()`](/azure/kusto/query/series-fill-forwardfunction), [`series_fill_backward()`](/azure/kusto/query/series-fill-backwardfunction) and [`series_fill_linear()`](/azure/kusto/query/series-fill-linearfunction) for changes
6060
- `byOsVer`: partition by OS
6161
- The actual time series data structure is a numeric array of the aggregated value per each time bin. We use `render timechart` for visualization.
6262

@@ -67,14 +67,14 @@ In the table above, we have three partitions. We can create a separate time seri
6767
## Time series analysis functions
6868

6969
In this section, we'll perform typical series processing functions.
70-
Once a set of time series is created, ADX supports a growing list of functions to process and analyze them which can be found in the [time series documentation](https://docs.microsoft.com/azure/kusto/query/machine-learning-and-tsa). We will describe a few representative functions for processing and analyzing time series.
70+
Once a set of time series is created, ADX supports a growing list of functions to process and analyze them which can be found in the [time series documentation](/azure/kusto/query/machine-learning-and-tsa). We will describe a few representative functions for processing and analyzing time series.
7171

7272
### Filtering
7373

7474
Filtering is a common practice in signal processing and useful for time series processing tasks (for example, smooth a noisy signal, change detection).
7575
- There are two generic filtering functions:
76-
- [`series_fir()`](https://docs.microsoft.com/azure/kusto/query/series-firfunction): Applying FIR filter. Used for simple calculation of moving average and differentiation of the time series for change detection.
77-
- [`series_iir()`](https://docs.microsoft.com/azure/kusto/query/series-iirfunction): Applying IIR filter. Used for exponential smoothing and cumulative sum.
76+
- [`series_fir()`](/azure/kusto/query/series-firfunction): Applying FIR filter. Used for simple calculation of moving average and differentiation of the time series for change detection.
77+
- [`series_iir()`](/azure/kusto/query/series-iirfunction): Applying IIR filter. Used for exponential smoothing and cumulative sum.
7878
- `Extend` the time series set by adding a new moving average series of size 5 bins (named *ma_num*) to the query:
7979

8080
```kusto
@@ -91,8 +91,8 @@ demo_make_series1
9191
### Regression analysis
9292

9393
ADX supports segmented linear regression analysis to estimate the trend of the time series.
94-
- Use [series_fit_line()](https://docs.microsoft.com/azure/kusto/query/series-fit-linefunction) to fit the best line to a time series for general trend detection.
95-
- Use [series_fit_2lines()](https://docs.microsoft.com/azure/kusto/query/series-fit-2linesfunction) to detect trend changes, relative to the baseline, that are useful in monitoring scenarios.
94+
- Use [series_fit_line()](/azure/kusto/query/series-fit-linefunction) to fit the best line to a time series for general trend detection.
95+
- Use [series_fit_2lines()](/azure/kusto/query/series-fit-2linesfunction) to detect trend changes, relative to the baseline, that are useful in monitoring scenarios.
9696

9797
Example of `series_fit_line()` and `series_fit_2lines()` functions in a time series query:
9898

@@ -124,8 +124,9 @@ demo_series3
124124

125125
![Time series seasonality](media/time-series-analysis/time-series-seasonality.png)
126126

127-
- Use [series_periods_detect()](https://docs.microsoft.com/azure/kusto/query/series-periods-detectfunction) to automatically detect the periods in the time series.
128-
- Use [series_periods_validate()](https://docs.microsoft.com/azure/kusto/query/series-periods-validatefunction) if we know that a metric should have specific distinct period(s) and we want to verify that they exist.
127+
- Use [series_periods_detect()](/azure/kusto/query/series-periods-detectfunction) to automatically detect the periods in the time series.
128+
- Use [series_periods_validate()](/azure/kusto/query/series-periods-validatefunction) if we know that a metric should have specific distinct period(s) and we want to verify that they exist.
129+
129130
> [!NOTE]
130131
> It's an anomaly if specific distinct periods don't exist
131132
@@ -146,7 +147,7 @@ The function detects daily and weekly seasonality. The daily scores less than th
146147

147148
### Element-wise functions
148149

149-
Arithmetic and logical operations can be done on a time series. Using [series_subtract()](https://docs.microsoft.com/azure/kusto/query/series-subtractfunction) we can calculate a residual time series, that is, the difference between original raw metric and a smoothed one, and look for anomalies in the residual signal:
150+
Arithmetic and logical operations can be done on a time series. Using [series_subtract()](/azure/kusto/query/series-subtractfunction) we can calculate a residual time series, that is, the difference between original raw metric and a smoothed one, and look for anomalies in the residual signal:
150151

151152
```kusto
152153
let min_t = toscalar(demo_make_series1 | summarize min(TimeStamp));
@@ -161,9 +162,9 @@ demo_make_series1
161162

162163
![Time series operations](media/time-series-analysis/time-series-operations.png)
163164

164-
Blue: original time series
165-
Red: smoothed time series
166-
Green: residual time series
165+
- Blue: original time series
166+
- Red: smoothed time series
167+
- Green: residual time series
167168

168169
## Time series workflow at scale
169170

@@ -253,6 +254,6 @@ demo_many_series1
253254
| | Loc 15 | -3207352159611332166 | 1151 | -102743.910227889 |
254255
| | Loc 13 | -3207352159611332166 | 1249 | -86303.2334644601 |
255256

256-
In less than two minutes, ADX detected two abnormal time series (out of 23115) in which the read count suddenly dropped.
257+
In less than two minutes, ADX analyzed over 20,000 time series and detected two abnormal time series in which the read count suddenly dropped.
257258

258259
These advanced capabilities combined with ADX fast performance supply a unique and powerful solution for time series analysis.

0 commit comments

Comments
 (0)