You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-explorer/ingest-data-overview.md
+24-18Lines changed: 24 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,9 +12,9 @@ ms.date: 09/24/2018
12
12
13
13
# Azure Data Explorer data ingestion
14
14
15
-
Data ingestion is the process used to load data records from one or more sources to create or update a table in Azure Data Explorer. Once ingested, the data becomes available for query. The diagram below shows the end-to-end flow for working in Azure Data Explorer, including data ingestion**(2)**.
15
+
Data ingestion is the process used to load data records from one or more sources to create or update a table in Azure Data Explorer. Once ingested, the data becomes available for query. The diagram below shows the end-to-end flow for working in Azure Data Explorer, including data ingestion.
16
16
17
-

The Azure Data Explorer data management service, which is responsible for data ingestion, provides the following functionality:
20
20
@@ -30,16 +30,16 @@ The Azure Data Explorer data management service, which is responsible for data i
30
30
31
31
1.**Commit the data ingest**: Makes the data available for query.
32
32
33
-
> [!NOTE]
34
-
> The effective retention policy of ingested data is derived from the database's retention policy. See [retention policy](https://docs.microsoft.com/azure/kusto/concepts/retentionpolicy) for details. Ingesting data requires **Table ingestor** or **Database ingestor** permissions.
35
-
36
33
## Ingestion methods
37
34
38
-
Azure Data Explorer supports several ingestion methods, each with its own target scenarios, advantages, and disadvantages. Azure Data Explorer offers connectors to common services, programmatic ingestion using SDKs, and direct access to the engine for exploration purposes.
35
+
Azure Data Explorer supports several ingestion methods, each with its own target scenarios, advantages, and disadvantages. Azure Data Explorer offers pipelines and connectors to common services, programmatic ingestion using SDKs, and direct access to the engine for exploration purposes.
36
+
37
+
### Ingestion using pipelines
39
38
40
-
### Ingestion using connectors
39
+
Azure Data Explorer currently supports the Event Hub pipeline, which can be managed using the management wizard in the Azure portal. For more information, see [Quickstart: Ingest data from Event Hub into Azure Data Explorer](ingest-data-event-hub.md).
41
40
42
-
Azure Data Explorer currently supports the Event Hub connector, which can be managed using the management wizard in the Azure portal. For more information, see [Quickstart: Ingest data from Event Hub into Azure Data Explorer](ingest-data-event-hub.md).
41
+
### Ingestion using connectors and plugins
42
+
Azure Data Explorer currently supports the Logstash plugin. For more information, see [Logstash Output Plugin for Azure Data Explorer](https://github.com/Azure/logstash-output-kusto/blob/master/README.md).
43
43
44
44
### Programmatic ingestion
45
45
@@ -49,21 +49,21 @@ Azure Data Explorer provides SDKs that can be used for query and data ingestion.
49
49
50
50
Kusto offers client SDK that can be used to ingest and query data with :
* Ingesting data through the Azure Data Explorer data management service (high-throughput and reliable ingestion)
64
+
* Ingesting data through the Azure Data Explorer data management service (high-throughput and reliable ingestion):
65
65
66
-
*[**Batch ingestion**](https://docs.microsoft.com/azure/kusto/api/netfx/kusto-ingest-queued-ingest-sample) (provided by SDK): the client uploads the data to Azure Blob storage (designated by the Azure Data Explorer data management service) and posts a notification to an Azure Queue. This is the recommended technique for high-volume, reliable and cheap data ingestion.
66
+
*[**Batch ingestion**](/azure/kusto/api/netfx/kusto-ingest-queued-ingest-sample) (provided by SDK): the client uploads the data to Azure Blob storage (designated by the Azure Data Explorer data management service) and posts a notification to an Azure Queue. This is the recommended technique for high-volume, reliable and cheap data ingestion.
67
67
68
68
* Ingesting data directly into the Azure Data Explorer engine (most appropriate for exploration and prototyping):
69
69
@@ -113,16 +113,22 @@ For all ingestion methods other than ingest from query, the data must be formatt
113
113
> [!NOTE]
114
114
> When data is being ingested, data types are inferred based on the target table columns. If a record is incomplete or a field cannot be parsed as the required data type, the corresponding table columns will be populated with null values.
115
115
116
-
## Schema Mapping
116
+
## Ingestion recommendations and limitations
117
+
* The effective retention policy of ingested data is derived from the database's retention policy. See [retention policy](/azure/kusto/concepts/retentionpolicy) for details. Ingesting data requires **Table ingestor** or **Database ingestor** permissions.
118
+
* Ingestion supports a maximum file size of 5GB. The recommendation is to ingest files between 100MB and 1GB.
119
+
120
+
## Schema mapping
117
121
118
122
Schema mapping helps deterministically bind source data fields to destination table columns.
119
123
120
-
*[CSV Mapping](https://docs.microsoft.com/azure/kusto/management/mappings?branch=master#csv-mapping) (optional) works with all ordinal-based formats and can be passed as the ingest command parameter or [pre-created on the table](https://docs.microsoft.com/azure/kusto/management/tables?branch=master#create-ingestion-mapping) and referenced from the ingest command parameter.
121
-
*[JSON Mapping](https://docs.microsoft.com/azure/kusto/management/mappings?branch=master#json-mapping) (mandatory) and [Avro mapping](https://docs.microsoft.com/azure/kusto/management/mappings?branch=master#avro-mapping) (mandatory) can be passed as the ingest command parameter or [pre-created on the table](https://docs.microsoft.com/azure/kusto/management/tables#create-ingestion-mapping) and referenced from the ingest command parameter.
124
+
*[CSV Mapping](/azure/kusto/management/mappings?branch=master#csv-mapping) (optional) works with all ordinal-based formats and can be passed as the ingest command parameter or [pre-created on the table](/azure/kusto/management/tables?branch=master#create-ingestion-mapping) and referenced from the ingest command parameter.
125
+
*[JSON Mapping](/azure/kusto/management/mappings?branch=master#json-mapping) (mandatory) and [Avro mapping](/azure/kusto/management/mappings?branch=master#avro-mapping) (mandatory) can be passed as the ingest command parameter or [pre-created on the table](/azure/kusto/management/tables#create-ingestion-mapping) and referenced from the ingest command parameter.
122
126
123
127
## Next steps
124
128
125
129
[Quickstart: Ingest data from Event Hub into Azure Data Explorer](ingest-data-event-hub.md)
126
130
127
131
[Quickstart: Ingest data using the Azure Data Explorer Python library](python-ingest-data.md)
128
132
133
+
[Quickstart: Ingest data using the Azure Data Explorer Node library](node-ingest-data.md)
Copy file name to clipboardExpand all lines: articles/data-explorer/time-series-analysis.md
+15-14Lines changed: 15 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -53,10 +53,10 @@ demo_make_series1
53
53
| render timechart
54
54
```
55
55
56
-
- Use the [`make-series`](https://docs.microsoft.com/azure/kusto/query/make-seriesoperator) operator to create a set of three time series, where:
56
+
- Use the [`make-series`](/azure/kusto/query/make-seriesoperator) operator to create a set of three time series, where:
57
57
-`num=count()`: time series of traffic
58
58
-`range(min_t, max_t, 1h)`: time series is created in 1-hour bins in the time range (oldest and newest timestamps of table records)
59
-
-`default=0`: specify fill method for missing bins to create regular time series. Alternatively use [`series_fill_const()`](https://docs.microsoft.com/azure/kusto/query/series-fill-constfunction), [`series_fill_forward()`](https://docs.microsoft.com/azure/kusto/query/series-fill-forwardfunction), [`series_fill_backward()`](https://docs.microsoft.com/azure/kusto/query/series-fill-backwardfunction) and [`series_fill_linear()`](https://docs.microsoft.com/azure/kusto/query/series-fill-linearfunction) for changes
59
+
-`default=0`: specify fill method for missing bins to create regular time series. Alternatively use [`series_fill_const()`](/azure/kusto/query/series-fill-constfunction), [`series_fill_forward()`](/azure/kusto/query/series-fill-forwardfunction), [`series_fill_backward()`](/azure/kusto/query/series-fill-backwardfunction) and [`series_fill_linear()`](/azure/kusto/query/series-fill-linearfunction) for changes
60
60
-`byOsVer`: partition by OS
61
61
- The actual time series data structure is a numeric array of the aggregated value per each time bin. We use `render timechart` for visualization.
62
62
@@ -67,14 +67,14 @@ In the table above, we have three partitions. We can create a separate time seri
67
67
## Time series analysis functions
68
68
69
69
In this section, we'll perform typical series processing functions.
70
-
Once a set of time series is created, ADX supports a growing list of functions to process and analyze them which can be found in the [time series documentation](https://docs.microsoft.com/azure/kusto/query/machine-learning-and-tsa). We will describe a few representative functions for processing and analyzing time series.
70
+
Once a set of time series is created, ADX supports a growing list of functions to process and analyze them which can be found in the [time series documentation](/azure/kusto/query/machine-learning-and-tsa). We will describe a few representative functions for processing and analyzing time series.
71
71
72
72
### Filtering
73
73
74
74
Filtering is a common practice in signal processing and useful for time series processing tasks (for example, smooth a noisy signal, change detection).
75
75
- There are two generic filtering functions:
76
-
-[`series_fir()`](https://docs.microsoft.com/azure/kusto/query/series-firfunction): Applying FIR filter. Used for simple calculation of moving average and differentiation of the time series for change detection.
77
-
-[`series_iir()`](https://docs.microsoft.com/azure/kusto/query/series-iirfunction): Applying IIR filter. Used for exponential smoothing and cumulative sum.
76
+
-[`series_fir()`](/azure/kusto/query/series-firfunction): Applying FIR filter. Used for simple calculation of moving average and differentiation of the time series for change detection.
77
+
-[`series_iir()`](/azure/kusto/query/series-iirfunction): Applying IIR filter. Used for exponential smoothing and cumulative sum.
78
78
-`Extend` the time series set by adding a new moving average series of size 5 bins (named *ma_num*) to the query:
79
79
80
80
```kusto
@@ -91,8 +91,8 @@ demo_make_series1
91
91
### Regression analysis
92
92
93
93
ADX supports segmented linear regression analysis to estimate the trend of the time series.
94
-
- Use [series_fit_line()](https://docs.microsoft.com/azure/kusto/query/series-fit-linefunction) to fit the best line to a time series for general trend detection.
95
-
- Use [series_fit_2lines()](https://docs.microsoft.com/azure/kusto/query/series-fit-2linesfunction) to detect trend changes, relative to the baseline, that are useful in monitoring scenarios.
94
+
- Use [series_fit_line()](/azure/kusto/query/series-fit-linefunction) to fit the best line to a time series for general trend detection.
95
+
- Use [series_fit_2lines()](/azure/kusto/query/series-fit-2linesfunction) to detect trend changes, relative to the baseline, that are useful in monitoring scenarios.
96
96
97
97
Example of `series_fit_line()` and `series_fit_2lines()` functions in a time series query:
98
98
@@ -124,8 +124,9 @@ demo_series3
124
124
125
125

126
126
127
-
- Use [series_periods_detect()](https://docs.microsoft.com/azure/kusto/query/series-periods-detectfunction) to automatically detect the periods in the time series.
128
-
- Use [series_periods_validate()](https://docs.microsoft.com/azure/kusto/query/series-periods-validatefunction) if we know that a metric should have specific distinct period(s) and we want to verify that they exist.
127
+
- Use [series_periods_detect()](/azure/kusto/query/series-periods-detectfunction) to automatically detect the periods in the time series.
128
+
- Use [series_periods_validate()](/azure/kusto/query/series-periods-validatefunction) if we know that a metric should have specific distinct period(s) and we want to verify that they exist.
129
+
129
130
> [!NOTE]
130
131
> It's an anomaly if specific distinct periods don't exist
131
132
@@ -146,7 +147,7 @@ The function detects daily and weekly seasonality. The daily scores less than th
146
147
147
148
### Element-wise functions
148
149
149
-
Arithmetic and logical operations can be done on a time series. Using [series_subtract()](https://docs.microsoft.com/azure/kusto/query/series-subtractfunction) we can calculate a residual time series, that is, the difference between original raw metric and a smoothed one, and look for anomalies in the residual signal:
150
+
Arithmetic and logical operations can be done on a time series. Using [series_subtract()](/azure/kusto/query/series-subtractfunction) we can calculate a residual time series, that is, the difference between original raw metric and a smoothed one, and look for anomalies in the residual signal:
150
151
151
152
```kusto
152
153
let min_t = toscalar(demo_make_series1 | summarize min(TimeStamp));
@@ -161,9 +162,9 @@ demo_make_series1
161
162
162
163

163
164
164
-
Blue: original time series
165
-
Red: smoothed time series
166
-
Green: residual time series
165
+
-Blue: original time series
166
+
-Red: smoothed time series
167
+
-Green: residual time series
167
168
168
169
## Time series workflow at scale
169
170
@@ -253,6 +254,6 @@ demo_many_series1
253
254
|| Loc 15 | -3207352159611332166 | 1151 | -102743.910227889 |
254
255
|| Loc 13 | -3207352159611332166 | 1249 | -86303.2334644601 |
255
256
256
-
In less than two minutes, ADX detected two abnormal time series (out of 23115) in which the read count suddenly dropped.
257
+
In less than two minutes, ADX analyzed over 20,000 time series and detected two abnormal time series in which the read count suddenly dropped.
257
258
258
259
These advanced capabilities combined with ADX fast performance supply a unique and powerful solution for time series analysis.
0 commit comments