Merge pull request #57842 from orspod/Nov2018

Jak-MS · web-flow · commit 3db919e46f9c · 2018-11-14T10:06:46.000-06:00
Nov2018
diff --git a/articles/data-explorer/ingest-data-overview.md b/articles/data-explorer/ingest-data-overview.md
@@ -12,9 +12,9 @@ ms.date: 09/24/2018
 
 # Azure Data Explorer data ingestion
 
-Data ingestion is the process used to load data records from one or more sources to create or update a table in Azure Data Explorer. Once ingested, the data becomes available for query. The diagram below shows the end-to-end flow for working in Azure Data Explorer, including data ingestion **(2)**.
+Data ingestion is the process used to load data records from one or more sources to create or update a table in Azure Data Explorer. Once ingested, the data becomes available for query. The diagram below shows the end-to-end flow for working in Azure Data Explorer, including data ingestion.
 
-![Overall data flow](media/ingest-data-overview/overall-data-flow.png)
+![Data flow](media/ingest-data-overview/data-flow.png)
 
 The Azure Data Explorer data management service, which is responsible for data ingestion, provides the following functionality:
 
@@ -30,16 +30,16 @@ The Azure Data Explorer data management service, which is responsible for data i
 
 1. **Commit the data ingest**: Makes the data available for query.
 
-> [!NOTE]
-> The effective retention policy of ingested data is derived from the database's retention policy. See [retention policy](https://docs.microsoft.com/azure/kusto/concepts/retentionpolicy) for details. Ingesting data requires **Table ingestor** or **Database ingestor** permissions.
-
 ## Ingestion methods
 
-Azure Data Explorer supports several ingestion methods, each with its own target scenarios, advantages, and disadvantages. Azure Data Explorer offers connectors to common services, programmatic ingestion using SDKs, and direct access to the engine for exploration purposes.
+Azure Data Explorer supports several ingestion methods, each with its own target scenarios, advantages, and disadvantages. Azure Data Explorer offers pipelines and connectors to common services, programmatic ingestion using SDKs, and direct access to the engine for exploration purposes.
+
+### Ingestion using pipelines
 
-### Ingestion using connectors
+Azure Data Explorer currently supports the Event Hub pipeline, which can be managed using the management wizard in the Azure portal. For more information, see [Quickstart: Ingest data from Event Hub into Azure Data Explorer](ingest-data-event-hub.md).
 
-Azure Data Explorer currently supports the Event Hub connector, which can be managed using the management wizard in the Azure portal. For more information, see [Quickstart: Ingest data from Event Hub into Azure Data Explorer](ingest-data-event-hub.md).
+### Ingestion using connectors and plugins
+Azure Data Explorer currently supports the Logstash plugin. For more information, see [Logstash Output Plugin for Azure Data Explorer](https://github.com/Azure/logstash-output-kusto/blob/master/README.md).
 
 ### Programmatic ingestion
 
@@ -49,21 +49,21 @@ Azure Data Explorer provides SDKs that can be used for query and data ingestion.
 
 Kusto offers client SDK that can be used to ingest and query data with :
 
-* [Python SDK](https://docs.microsoft.com/azure/kusto/api/python/kusto-python-client-library)
+* [Python SDK](/azure/kusto/api/python/kusto-python-client-library)
 
-* [.NET SDK](https://docs.microsoft.com/azure/kusto/api/netfx/about-the-sdk)
+* [.NET SDK](/azure/kusto/api/netfx/about-the-sdk)
 
-* [Java SDK](https://docs.microsoft.com/azure/kusto/api/java/kusto-java-client-library)
+* [Java SDK](/azure/kusto/api/java/kusto-java-client-library)
 
-* [Node SDK]
+* [Node SDK](/azure/kusto/api/node/kusto-node-client-library)
 
-* [REST API](https://docs.microsoft.com/azure/kusto/api/netfx/kusto-ingest-client-rest)
+* [REST API](/azure/kusto/api/netfx/kusto-ingest-client-rest)
 
 **Programmatic ingestion techniques**:
 
-* Ingesting data through the Azure Data Explorer data management service (high-throughput and reliable ingestion)
+* Ingesting data through the Azure Data Explorer data management service (high-throughput and reliable ingestion):
 
-  * [**Batch ingestion**](https://docs.microsoft.com/azure/kusto/api/netfx/kusto-ingest-queued-ingest-sample) (provided by SDK): the client uploads the data to Azure Blob storage (designated by the Azure Data Explorer data management service) and posts a notification to an Azure Queue. This is the recommended technique for high-volume, reliable and cheap data ingestion.
+  * [**Batch ingestion**](/azure/kusto/api/netfx/kusto-ingest-queued-ingest-sample) (provided by SDK): the client uploads the data to Azure Blob storage (designated by the Azure Data Explorer data management service) and posts a notification to an Azure Queue. This is the recommended technique for high-volume, reliable and cheap data ingestion.
 
 * Ingesting data directly into the Azure Data Explorer engine (most appropriate for exploration and prototyping):
 
@@ -113,16 +113,22 @@ For all ingestion methods other than ingest from query, the data must be formatt
 > [!NOTE]
 > When data is being ingested, data types are inferred based on the target table columns. If a record is incomplete or a field cannot be parsed as the required data type, the corresponding table columns will be populated with null values.
 
-## Schema Mapping
+## Ingestion recommendations and limitations
+* The effective retention policy of ingested data is derived from the database's retention policy. See [retention policy](/azure/kusto/concepts/retentionpolicy) for details. Ingesting data requires **Table ingestor** or **Database ingestor** permissions.
+* Ingestion supports a maximum file size of 5GB. The recommendation is to ingest files between 100MB and 1GB.
+
+## Schema mapping
 
 Schema mapping helps deterministically bind source data fields to destination table columns.
 
-* [CSV Mapping](https://docs.microsoft.com/azure/kusto/management/mappings?branch=master#csv-mapping) (optional) works with all ordinal-based formats and can be passed as the ingest command parameter or [pre-created on the table](https://docs.microsoft.com/azure/kusto/management/tables?branch=master#create-ingestion-mapping) and referenced from the ingest command parameter.
-* [JSON Mapping](https://docs.microsoft.com/azure/kusto/management/mappings?branch=master#json-mapping) (mandatory) and [Avro mapping](https://docs.microsoft.com/azure/kusto/management/mappings?branch=master#avro-mapping) (mandatory) can be passed as the ingest command parameter or [pre-created on the table](https://docs.microsoft.com/azure/kusto/management/tables#create-ingestion-mapping) and referenced from the ingest command parameter.
+* [CSV Mapping](/azure/kusto/management/mappings?branch=master#csv-mapping) (optional) works with all ordinal-based formats and can be passed as the ingest command parameter or [pre-created on the table](/azure/kusto/management/tables?branch=master#create-ingestion-mapping) and referenced from the ingest command parameter.
+* [JSON Mapping](/azure/kusto/management/mappings?branch=master#json-mapping) (mandatory) and [Avro mapping](/azure/kusto/management/mappings?branch=master#avro-mapping) (mandatory) can be passed as the ingest command parameter or [pre-created on the table](/azure/kusto/management/tables#create-ingestion-mapping) and referenced from the ingest command parameter.
 
 ## Next steps
 
 [Quickstart: Ingest data from Event Hub into Azure Data Explorer](ingest-data-event-hub.md)
 
 [Quickstart: Ingest data using the Azure Data Explorer Python library](python-ingest-data.md)
 
+[Quickstart: Ingest data using the Azure Data Explorer Node library](node-ingest-data.md)
+
diff --git a/articles/data-explorer/media/ingest-data-overview/data-flow.PNG b/articles/data-explorer/media/ingest-data-overview/data-flow.PNG
diff --git a/articles/data-explorer/time-series-analysis.md b/articles/data-explorer/time-series-analysis.md
@@ -53,10 +53,10 @@ demo_make_series1
 | render timechart 
 ```
 
-- Use the [`make-series`](https://docs.microsoft.com/azure/kusto/query/make-seriesoperator) operator to create a set of three time series, where:
+- Use the [`make-series`](/azure/kusto/query/make-seriesoperator) operator to create a set of three time series, where:
     - `num=count()`: time series of traffic
     - `range(min_t, max_t, 1h)`: time series is created in 1-hour bins in the time range (oldest and newest timestamps of table records)
-    - `default=0`: specify fill method for missing bins to create regular time series. Alternatively use [`series_fill_const()`](https://docs.microsoft.com/azure/kusto/query/series-fill-constfunction), [`series_fill_forward()`](https://docs.microsoft.com/azure/kusto/query/series-fill-forwardfunction), [`series_fill_backward()`](https://docs.microsoft.com/azure/kusto/query/series-fill-backwardfunction) and [`series_fill_linear()`](https://docs.microsoft.com/azure/kusto/query/series-fill-linearfunction) for changes
+    - `default=0`: specify fill method for missing bins to create regular time series. Alternatively use [`series_fill_const()`](/azure/kusto/query/series-fill-constfunction), [`series_fill_forward()`](/azure/kusto/query/series-fill-forwardfunction), [`series_fill_backward()`](/azure/kusto/query/series-fill-backwardfunction) and [`series_fill_linear()`](/azure/kusto/query/series-fill-linearfunction) for changes
     - `byOsVer`:  partition by OS
 - The actual time series data structure is a numeric array of the aggregated value per each time bin. We use `render timechart` for visualization.
 
@@ -67,14 +67,14 @@ In the table above, we have three partitions. We can create a separate time seri
 ## Time series analysis functions
 
 In this section, we'll perform typical series processing functions.
-Once a set of time series is created, ADX supports a growing list of functions to process and analyze them which can be found in the [time series documentation](https://docs.microsoft.com/azure/kusto/query/machine-learning-and-tsa). We will describe a few representative functions for processing and analyzing time series.
+Once a set of time series is created, ADX supports a growing list of functions to process and analyze them which can be found in the [time series documentation](/azure/kusto/query/machine-learning-and-tsa). We will describe a few representative functions for processing and analyzing time series.
 
 ### Filtering
 
 Filtering is a common practice in signal processing and useful for time series processing tasks (for example, smooth a noisy signal, change detection).
 - There are two generic filtering functions:
-    - [`series_fir()`](https://docs.microsoft.com/azure/kusto/query/series-firfunction): Applying FIR filter. Used for simple calculation of moving average and differentiation of the time series for change detection.
-    - [`series_iir()`](https://docs.microsoft.com/azure/kusto/query/series-iirfunction): Applying IIR filter. Used for exponential smoothing and cumulative sum.
+    - [`series_fir()`](/azure/kusto/query/series-firfunction): Applying FIR filter. Used for simple calculation of moving average and differentiation of the time series for change detection.
+    - [`series_iir()`](/azure/kusto/query/series-iirfunction): Applying IIR filter. Used for exponential smoothing and cumulative sum.
 - `Extend` the time series set by adding a new moving average series of size 5 bins (named *ma_num*) to the query:
 
 ```kusto
@@ -91,8 +91,8 @@ demo_make_series1
 ### Regression analysis
 
 ADX supports segmented linear regression analysis to estimate the trend of the time series.
-- Use [series_fit_line()](https://docs.microsoft.com/azure/kusto/query/series-fit-linefunction) to fit the best line to a time series for general trend detection.
-- Use [series_fit_2lines()](https://docs.microsoft.com/azure/kusto/query/series-fit-2linesfunction) to detect trend changes, relative to the baseline, that are useful in monitoring scenarios.
+- Use [series_fit_line()](/azure/kusto/query/series-fit-linefunction) to fit the best line to a time series for general trend detection.
+- Use [series_fit_2lines()](/azure/kusto/query/series-fit-2linesfunction) to detect trend changes, relative to the baseline, that are useful in monitoring scenarios.
 
 Example of `series_fit_line()` and  `series_fit_2lines()` functions in a time series query:
 
@@ -124,8 +124,9 @@ demo_series3
 
 ![Time series seasonality](media/time-series-analysis/time-series-seasonality.png)
 
-- Use [series_periods_detect()](https://docs.microsoft.com/azure/kusto/query/series-periods-detectfunction) to automatically detect the periods in the time series. 
-- Use [series_periods_validate()](https://docs.microsoft.com/azure/kusto/query/series-periods-validatefunction) if we know that a metric should have specific distinct period(s) and we want to verify that they exist.
+- Use [series_periods_detect()](/azure/kusto/query/series-periods-detectfunction) to automatically detect the periods in the time series. 
+- Use [series_periods_validate()](/azure/kusto/query/series-periods-validatefunction) if we know that a metric should have specific distinct period(s) and we want to verify that they exist.
+
 > [!NOTE]
 > It's an anomaly if specific distinct periods don't exist
 
@@ -146,7 +147,7 @@ The function detects daily and weekly seasonality. The daily scores less than th
 
 ### Element-wise functions
 
-Arithmetic and logical operations can be done on a time series. Using [series_subtract()](https://docs.microsoft.com/azure/kusto/query/series-subtractfunction) we can calculate a residual time series, that is, the difference between original raw metric and a smoothed one, and look for anomalies in the residual signal:
+Arithmetic and logical operations can be done on a time series. Using [series_subtract()](/azure/kusto/query/series-subtractfunction) we can calculate a residual time series, that is, the difference between original raw metric and a smoothed one, and look for anomalies in the residual signal:
 
 ```kusto
 let min_t = toscalar(demo_make_series1 | summarize min(TimeStamp));
@@ -161,9 +162,9 @@ demo_make_series1
 
 ![Time series operations](media/time-series-analysis/time-series-operations.png)
 
-Blue: original time series
-Red: smoothed time series
-Green: residual time series
+- Blue: original time series
+- Red: smoothed time series
+- Green: residual time series
 
 ## Time series workflow at scale
 
@@ -253,6 +254,6 @@ demo_many_series1
 |   | Loc 15 | -3207352159611332166 | 1151 | -102743.910227889 |
 |   | Loc 13 | -3207352159611332166 | 1249 | -86303.2334644601 |
 
-In less than two minutes, ADX detected two abnormal time series (out of 23115) in which the read count suddenly dropped.
+In less than two minutes, ADX analyzed over 20,000 time series and detected two abnormal time series in which the read count suddenly dropped.
 
 These advanced capabilities combined with ADX fast performance supply a unique and powerful solution for time series analysis.