diff --git a/docset.yml b/docset.yml index 05de5870cf..235e093adf 100644 --- a/docset.yml +++ b/docset.yml @@ -370,10 +370,10 @@ subs: dataframe-transforms-cap: "Transforms" dfanalytics-cap: "Data frame analytics" dfanalytics: "data frame analytics" - dataframe-analytics-config: "'{dataframe} analytics config'" - dfanalytics-job: "'{dataframe} analytics job'" - dfanalytics-jobs: "'{dataframe} analytics jobs'" - dfanalytics-jobs-cap: "'{dataframe-cap} analytics jobs'" + dataframe-analytics-config: "data frame analytics analytics config" + dfanalytics-job: "data frame analytics analytics job" + dfanalytics-jobs: "data frame analytics analytics jobs" + dfanalytics-jobs-cap: "Data frame analytics analytics jobs" cdataframe: "continuous data frame" cdataframes: "continuous data frames" cdataframe-cap: "Continuous data frame" @@ -390,8 +390,8 @@ subs: olscore: "outlier score" olscores: "outlier scores" fiscore: "feature influence score" - evaluatedf-api: "evaluate {dataframe} analytics API" - evaluatedf-api-cap: "Evaluate {dataframe} analytics API" + evaluatedf-api: "evaluate data frame analytics API" + evaluatedf-api-cap: "Evaluate data frame analytics API" binarysc: "binary soft classification" binarysc-cap: "Binary soft classification" regression: "regression" diff --git a/explore-analyze/machine-learning/data-frame-analytics.md b/explore-analyze/machine-learning/data-frame-analytics.md index e0e5ef3748..adfa295181 100644 --- a/explore-analyze/machine-learning/data-frame-analytics.md +++ b/explore-analyze/machine-learning/data-frame-analytics.md @@ -4,11 +4,18 @@ mapped_urls: - https://www.elastic.co/guide/en/kibana/current/xpack-ml-dfanalytics.html --- -# Data frame analytics +# Data frame analytics [ml-dfanalytics] -% What needs to be done: Lift-and-shift +::::{important} +Using {{dfanalytics}} requires source data to be structured as a two dimensional "tabular" data structure, in other words a {{dataframe}}. [{{transforms-cap}}](../transforms.md) enable you to create {{dataframes}} which can be used as the source for {{dfanalytics}}. +:::: -% Use migrated content from existing pages that map to this page: +{{dfanalytics-cap}} enable you to perform different analyses of your data and annotate it with the results. Consult [Setup and security](setting-up-machine-learning.md) to learn more about the license and the security privileges that are required to use {{dfanalytics}}. -% - [ ] ./raw-migrated-files/stack-docs/machine-learning/ml-dfanalytics.md -% - [ ] ./raw-migrated-files/kibana/kibana/xpack-ml-dfanalytics.md \ No newline at end of file +* [Overview](data-frame-analytics/ml-dfa-overview.md) +* [*Finding outliers*](data-frame-analytics/ml-dfa-finding-outliers.md) +* [*Predicting numerical values with {{regression}}*](data-frame-analytics/ml-dfa-regression.md) +* [*Predicting classes with {{classification}}*](data-frame-analytics/ml-dfa-classification.md) +* [*Advanced concepts*](data-frame-analytics/ml-dfa-concepts.md) +* [*API quick reference*](data-frame-analytics/ml-dfanalytics-apis.md) +* [*Resources*](data-frame-analytics/ml-dfa-resources.md) diff --git a/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-classification.md b/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-classification.md index 9f164ca9b7..4ae0f76d56 100644 --- a/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-classification.md +++ b/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-classification.md @@ -15,22 +15,18 @@ In reality, {{classification}} problems are more complex, such as classifying ma When you create a {{classification}} job, you must specify which field contains the classes that you want to predict. This field is known as the *{{depvar}}*. It can contain maximum 100 classes. By default, all other [supported fields](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-dfanalytics.html#dfa-supported-fields) are included in the analysis and are known as *{{feature-vars}}*. You can optionally include or exclude fields. For more information about field selection, refer to the [explain data frame analytics API](https://www.elastic.co/guide/en/elasticsearch/reference/current/explain-dfanalytics.html). - ## {{classification-cap}} algorithms [dfa-classification-algorithm] {{classanalysis-cap}} uses an ensemble algorithm that is similar to extreme gradient boosting (XGBoost) which combines multiple weak models into a composite one. It uses decision trees to learn to predict the probability that a data point belongs to a certain class. XGBoost trains a sequence of decision trees and every decision tree learns from the mistakes of the forest so far. In each iteration, the trees added to the forest improve the decision quality of the combined decision forest. The classification algorithm optimizes for a loss function called cross-entropy loss. - ## 1. Define the problem [dfa-classification-problem] {{classification-cap}} can be useful in cases where discrete, categorical values needs to be predicted. If your use case requires predicting such values, then {{classification}} might be the suitable choice for you. - ## 2. Set up the environment [dfa-classification-environment] Before you can use the {{stack-ml-features}}, there are some configuration requirements (such as security privileges) that must be addressed. Refer to [Setup and security](../setting-up-machine-learning.md). - ## 3. Prepare and transform data [dfa-classification-prepare-data] {{classification-cap}} is a supervised {{ml}} method, which means you need to supply a labeled training data set. This data set must have values for the {{feature-vars}} and the {{depvar}} which are used to train the model. The training process uses this information to learn the relationships between the classes and the {{feature-vars}}. This labeled data set also plays a critical role in model evaluation. @@ -41,7 +37,6 @@ You might also need to [{{transform}}](../../transforms.md) your data to create To learn more about how to prepare your data, refer to [the relevant section](ml-dfa-overview.md#prepare-transform-data) of the supervised learning overview. - ## 4. Create a job [dfa-classification-create-job] {{dfanalytics-jobs-cap}} contain the configuration information and metadata necessary to perform an analytics task. You can create {{dfanalytics-jobs}} via {{kib}} or using the [create {{dfanalytics-jobs}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-dfanalytics.html). @@ -52,10 +47,8 @@ Select {{classification}} as the analytics type, then select the field that you You can view the statistics of the selectable fields in the {{dfanalytics}} wizard. The field statistics displayed in a flyout provide more meaningful context to help you select relevant fields. :::: - To improve performance, consider using a small `training_percent` value to train the model more quickly. It is a good strategy to make progress iteratively: run the analysis with a small training percentage, then evaluate the performance. Based on the results, you can decide if it is necessary to increase the `training_percent` value. - ## 5. Start the job [dfa-classification-start] You can start the job via {{kib}} or using the [start {{dfanalytics-jobs}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-dfanalytics.html) API. A {{classification}} job has the following phases: @@ -75,8 +68,6 @@ After the last phase is finished, the job stops and the results are ready for ev When you create a {{dfanalytics-job}}, the inference step of the process might fail if the model is too large to fit into JVM. For a workaround, refer to [this GitHub issue](https://github.com/elastic/elasticsearch/issues/76093). :::: - - ## 6. Evaluate and interpret the result [ml-dfanalytics-classification-evaluation] Using the {{dfanalytics}} features to gain insights from a data set is an iterative process. After you defined the problem you want to solve, and chose the analytics type that can help you to do so, you need to produce a high-quality data set and create the appropriate {{dfanalytics-job}}. You might need to experiment with different configurations, parameters, and ways to transform data before you arrive at a result that satisfies your use case. A valuable companion to this process is the [{{evaluatedf-api}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/evaluate-dfanalytics.html), which enables you to evaluate the {{dfanalytics}} performance. It helps you understand error distributions and identifies the points where the {{dfanalytics}} model performs well or less trustworthily. @@ -90,11 +81,10 @@ You can measure how well the model has performed on your training data set by us The following metrics helps you interpret the analysis results: -* {feat-imp} +* {{feat-imp}} * `class_probability` * `class_score` - ### Multiclass confusion matrix [ml-dfanalytics-mccm] The multiclass confusion matrix provides a summary of the performance of the {{classanalysis}}. It contains the number of occurrences where the analysis classified data points correctly with their actual class as well as the number of occurrences where it misclassified them. @@ -115,7 +105,6 @@ As the number of classes increases, the confusion matrix becomes more complex: This matrix contains the actual labels on the left side while the predicted labels are on the top. The proportion of correct and incorrect predictions is broken down for each class. This enables you to examine how the {{classanalysis}} confused the different classes while it made its predictions. - ### Area under the curve of receiver operating characteristic (AUC ROC) [ml-dfanalytics-class-aucroc] The receiver operating characteristic (ROC) curve is a plot that represents the performance of the {{classification}} process at different predicted probability thresholds. It compares the true positive rate for a specific class against the rate of all the other classes combined ("one versus all" strategy) at the different threshold levels to create the curve. @@ -128,18 +117,14 @@ From this plot, you can compute the area under the curve (AUC) value, which is a To use this evaluation method, you must set `num_top_classes` to `-1` or a value greater than or equal to the total number of classes when you create the {{dfanalytics-job}}. :::: - - ### {{feat-imp-cap}} [dfa-classification-feature-importance] {{feat-imp-cap}} provides further information about the results of an analysis and helps to interpret the results in a more subtle way. If you want to learn more about {{feat-imp}}, refer to [{{feat-imp-cap}}](ml-feature-importance.md). - ### `class_probability` [dfa-classification-class-probability] The `class_probability` is a value between 0 and 1, which indicates how likely it is that a given data point belongs to a certain class. The higher the number, the higher the probability that the data point belongs to the named class. This information is stored in the `top_classes` array for each document in the destination index. - ### `class_score` [dfa-classification-class-score] The `class_score` is a function of the `class_probability` and has a value that is greater than or equal to zero. It takes into consideration your objective (as defined in the `class_assignment_objective` job configuration option): *accuracy* or *recall*. @@ -155,7 +140,6 @@ If your objective is to maximize accuracy, the scores are weighted to maximize t If there is an imbalanced class distribution in your training data, focusing on accuracy can decrease your model’s sensitivity to incorrect predictions in the under-represented classes. :::: - By default, {{classanalysis}} jobs accept a slight degradation of the overall accuracy in return for greater sensitivity to classes that are predicted incorrectly. That is to say, their objective is to maximize the minimum recall. For example, in the context of a multi-class confusion matrix, the predictions of interest are in each row: :::{image} ../../../images/machine-learning-confusion-matrix-multiclass-recall.jpg @@ -167,7 +151,6 @@ For each class, the recall is calculated as the number of correct predictions di To learn more about choosing the class assignment objective that fits your goal, refer to this [Jupyter notebook](https://github.com/elastic/examples/blob/master/Machine%20Learning/Class%20Assigment%20Objectives/classification-class-assignment-objective.ipynb). - ## 7. Deploy the model [dfa-classification-deploy] The model that you created is stored as {{es}} documents in internal indices. In other words, the characteristics of your trained model are saved and ready to be deployed and used as functions. @@ -175,24 +158,24 @@ The model that you created is stored as {{es}} documents in internal indices. In 1. To deploy {{dfanalytics}} model in a pipeline, navigate to **Machine Learning** > **Model Management** > **Trained models** in the main menu, or use the [global search field](../../overview/kibana-quickstart.md#_finding_your_apps_and_objects) in {{kib}}. 2. Find the model you want to deploy in the list and click **Deploy model** in the **Actions** menu. - :::{image} ../../../images/machine-learning-ml-dfa-trained-models-ui.png - :alt: The trained models UI in {kib} - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-ml-dfa-trained-models-ui.png +:alt: The trained models UI in {kib} +:class: screenshot +::: 3. Create an {{infer}} pipeline to be able to use the model against new data through the pipeline. Add a name and a description or use the default values. - :::{image} ../../../images/machine-learning-ml-dfa-inference-pipeline.png - :alt: Creating an inference pipeline - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-ml-dfa-inference-pipeline.png +:alt: Creating an inference pipeline +:class: screenshot +::: 4. Configure the pipeline processors or use the default settings. - :::{image} ../../../images/machine-learning-ml-dfa-inference-processor.png - :alt: Configuring an inference processor - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-ml-dfa-inference-processor.png +:alt: Configuring an inference processor +:class: screenshot +::: 5. Configure to handle ingest failures or use the default settings. 6. (Optional) Test your pipeline by running a simulation of the pipeline to confirm it produces the anticipated results. @@ -200,21 +183,18 @@ The model that you created is stored as {{es}} documents in internal indices. In The model is deployed and ready to use through the {{infer}} pipeline. - ### {{infer-cap}} [ml-inference-class] {{infer-cap}} enables you to use [trained {{ml}} models](ml-trained-models.md) against incoming data in a continuous fashion. For instance, suppose you have an online service and you would like to predict whether a customer is likely to churn. You have an index with historical data – information on the customer behavior throughout the years in your business – and a {{classification}} model that is trained on this data. The new information comes into a destination index of a {{ctransform}}. With {{infer}}, you can perform the {{classanalysis}} against the new data with the same input fields that you’ve trained the model on, and get a prediction. - #### {{infer-cap}} processor [ml-inference-processor-class] {{infer-cap}} can be used as a processor specified in an [ingest pipeline](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md). It uses a trained model to infer against the data that is being ingested in the pipeline. The model is used on the ingest node. {{infer-cap}} pre-processes the data by using the model and provides a prediction. After the process, the pipeline continues executing (if there is any other processor in the pipeline), finally the new data together with the results are indexed into the destination index. Check the [{{infer}} processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-processor.html) and [the {{ml}} {dfanalytics} API documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-df-analytics-apis.html) to learn more. - #### {{infer-cap}} aggregation [ml-inference-aggregation-class] {{infer-cap}} can also be used as a pipeline aggregation. You can reference a trained model in the aggregation to infer on the result field of the parent bucket aggregation. The {{infer}} aggregation uses the model on the results to provide a prediction. This aggregation enables you to run {{classification}} or {{reganalysis}} at search time. If you want to perform the analysis on a small set of data, this aggregation enables you to generate predictions without the need to set up a processor in the ingest pipeline. @@ -225,8 +205,6 @@ Check the [{{infer}} bucket aggregation](https://www.elastic.co/guide/en/elastic If you use trained model aliases to reference your trained model in an {{infer}} processor or {{infer}} aggregation, you can replace your trained model with a new one without the need of updating the processor or the aggregation. Reassign the alias you used to a new trained model ID by using the [Create or update trained model aliases API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-trained-models-aliases.html). The new trained model needs to use the same type of {{dfanalytics}} as the old one. :::: - - ## Performing {{classanalysis}} in the sample flight data set [performing-classification] Let’s try to predict whether a flight will be delayed or not by using the [sample flight data](../../overview/kibana-quickstart.md#gs-get-data-into-kibana). The data set contains information such as weather conditions, carrier, flight distance, origin, destination, and whether or not the flight was delayed. The {{classification}} model learns the relationships between the fields in your data to predict the value of the *dependent variable*, which in this case is the boolean `FlightDelay` field. @@ -235,8 +213,6 @@ Let’s try to predict whether a flight will be delayed or not by using the [sam If you want to view this example in a Jupyter notebook, [click here](https://github.com/elastic/examples/tree/master/Machine%20Learning/Analytics%20Jupyter%20Notebooks). :::: - - ### Preparing your data [flightdata-classification-data] Each document in the sample flight data set contains details for a single flight, so the data is ready for analysis; it is already in a two-dimensional entity-based data structure. In general, you often need to [transform](../../transforms.md) the data into an entity-centric index before you can analyze it. @@ -293,13 +269,10 @@ In order to be analyzed, a document must contain at least one field with a suppo :::: - ::::{tip} The sample flight data set is used in this example because it is easily accessible. However, the data has been manually created and contains some inconsistencies. For example, a flight can be both delayed and canceled. This is a good reminder that the quality of your input data affects the quality of your results. :::: - - ### Creating a {{classification}} model [flightdata-classification-model] To predict whether a specific flight is delayed: @@ -308,10 +281,10 @@ To predict whether a specific flight is delayed: You can use the wizard on the **{{ml-app}}** > **Data Frame Analytics** tab in {{kib}} or the [create {{dfanalytics-jobs}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-dfanalytics.html) API. - :::{image} ../../../images/machine-learning-flights-classification-job-1.jpg - :alt: Creating a {{dfanalytics-job}} in {kib} - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-flights-classification-job-1.jpg +:alt: Creating a {{dfanalytics-job}} in {kib} +:class: screenshot +::: 1. Choose `kibana_sample_data_flights` as the source index. 2. Choose `classification` as the job type. @@ -320,10 +293,10 @@ To predict whether a specific flight is delayed: The wizard includes a scatterplot matrix, which enables you to explore the relationships between the numeric fields. The color of each point is affected by the value of the {{depvar}} for that document, as shown in the legend. You can highlight an area in one of the charts and the corresponding area is also highlighted in the rest of the charts. You can use this matrix to help you decide which fields to include or exclude. - :::{image} ../../../images/machine-learning-flights-classification-scatterplot.png - :alt: A scatterplot matrix for three fields in {kib} - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-flights-classification-scatterplot.png +:alt: A scatterplot matrix for three fields in {kib} +:class: screenshot +::: If you want these charts to represent data from a larger sample size or from a randomized selection of documents, you can change the default behavior. However, a larger sample size might slow down the performance of the matrix and a randomized selection might put more load on the cluster due to the more intensive query. @@ -334,9 +307,9 @@ To predict whether a specific flight is delayed: 9. Add the name of the destination index that will contain the results. In {{kib}}, the index name matches the job ID by default. It will contain a copy of the source index data where each document is annotated with the results. If the index does not exist, it will be created automatically. 10. Use default values for all other options. - ::::{dropdown} API example - ```console - PUT _ml/data_frame/analytics/model-flight-delays-classification +::::{dropdown} API example +```console +PUT _ml/data_frame/analytics/model-flight-delays-classification { "source": { "index": [ @@ -363,13 +336,13 @@ To predict whether a specific flight is delayed: ] } } - ``` +``` - 1. The field name in the `dest` index that contains the analysis results. - 2. To disable {{feat-imp}} calculations, omit this option. +1. The field name in the `dest` index that contains the analysis results. +2. To disable {{feat-imp}} calculations, omit this option. - :::: +:::: After you configured your job, the configuration details are automatically validated. If the checks are successful, you can start the job. A warning message is shown if the configuration is invalid. The message contains a suggestion to improve the configuration to be validated. @@ -378,30 +351,30 @@ To predict whether a specific flight is delayed: The job takes a few minutes to run. Runtime depends on the local hardware and also on the number of documents and fields that are analyzed. The more fields and documents, the longer the job runs. It stops automatically when the analysis is complete. - ::::{dropdown} API example - ```console - POST _ml/data_frame/analytics/model-flight-delays-classification/_start - ``` +::::{dropdown} API example +```console +POST _ml/data_frame/analytics/model-flight-delays-classification/_start +``` - :::: +:::: 3. Check the job stats to follow the progress in {{kib}} or use the [get {{dfanalytics-jobs}} statistics API](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-dfanalytics-stats.html). - :::{image} ../../../images/machine-learning-flights-classification-details.jpg - :alt: Statistics for a {{dfanalytics-job}} in {kib} - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-flights-classification-details.jpg +:alt: Statistics for a {{dfanalytics-job}} in {kib} +:class: screenshot +::: When the job stops, the results are ready to view and evaluate. To learn more about the job phases, see [How {{dfanalytics-jobs}} work](ml-dfa-phases.md). - ::::{dropdown} API example - ```console - GET _ml/data_frame/analytics/model-flight-delays-classification/_stats - ``` +::::{dropdown} API example +```console +GET _ml/data_frame/analytics/model-flight-delays-classification/_stats +``` - The API call returns the following response: +The API call returns the following response: - ```console-result +```console-result { "count" : 1, "data_frame_analytics" : [ @@ -481,15 +454,13 @@ To predict whether a specific flight is delayed: "loss_type" : "binomial_logistic" } } - } + } } ] } - ``` - - :::: - +``` +:::: ### Viewing {{classification}} results [flightdata-classification-results] @@ -510,7 +481,6 @@ If you want to understand how certain the model is about each prediction, you ca If you have a large number of classes, your destination index contains a large number of predicted probabilities for each document. When you create the {{classification}} job, you can use the `num_top_classes` option to modify this behavior. :::: - ::::{dropdown} API example ```console GET model-flight-delays-classification/_search @@ -543,12 +513,10 @@ The snippet below shows the probability and score details for a document in the 1. An array of values specifying the probability of the prediction and the score for each class. - The class with the highest score is the prediction. In this example, `false` has a `class_score` of 0.35 while `true` has only 0.06, so the prediction will be `false`. For more details about these values, see [`class_score`](#dfa-classification-class-score). :::: - If you chose to calculate {{feat-imp}}, the destination index also contains `ml.feature_importance` objects. Every field that is included in the analysis (known as a *feature* of the data point) is assigned a {{feat-imp}} value. It has both a magnitude and a direction (positive or negative), which indicates how each field affects a particular prediction. Only the most significant values (in this case, the top 10) are stored in the index. However, the trained model metadata also contains the average magnitude of the {{feat-imp}} values for each field across all the training data. You can view this summarized information in {{kib}}: :::{image} ../../../images/machine-learning-flights-classification-total-importance.jpg @@ -646,7 +614,6 @@ The snippet below shows an example of the total {{feat-imp}} and the correspondi 3. This value is the minimum {{feat-imp}} value across all the training data for this field when the predicted class is `false`. 4. This value is the maximum {{feat-imp}} value across all the training data for this field when the predicted class is `false`. - To see the top {{feat-imp}} values for each prediction, search the destination index. For example: ```console @@ -698,10 +665,8 @@ The sum of the {{feat-imp}} values for each class in this data point approximate :::: - Lastly, {{kib}} provides a scatterplot matrix in the results. It has the same functionality as the matrix that you saw in the job wizard. Its purpose is to help you visualize and explore the relationships between the numeric fields and the {{depvar}}. - ### Evaluating {{classification}} results [flightdata-classification-evaluate] Though you can look at individual results and compare the predicted value (`ml.FlightDelay_prediction`) to the actual value (`FlightDelay`), you typically need to evaluate the success of your {{classification}} model as a whole. @@ -717,7 +682,6 @@ Though you can look at individual results and compare the predicted value (`ml.F As the sample data may change when it is loaded into {{kib}}, the results of the analysis can vary even if you use the same configuration as the example. Therefore, use this information as a guideline for interpreting your own results. :::: - If you want to see the exact number of occurrences, select a quadrant in the matrix. You can also use the **Training** and **Testing** filter options to refine the contents of the matrix. Thus you can see how well the model performs on previously unseen data. You can check how many documents are `true` in the testing data, how many of them are identified correctly (*true positives*) and how many of them are identified incorrectly as `false` (*false negatives*). Likewise if you select other quadrants in the matrix, it shows the number of documents that have the `false` class as their actual value in the testing data. The matrix shows the number of documents that are correctly identified as `false` (*true negatives*) and the number of documents that are incorrectly predicted as `true` (*false positives*). When you perform {{classanalysis}} on your own data, it might take multiple iterations before you are satisfied with the results and ready to deploy the model. @@ -759,7 +723,6 @@ POST _ml/data_frame/_evaluate 1. We calculate the training error by evaluating only the training data. - Next, we calculate the generalization error that represents how well the model performed on previously unseen data: ```console @@ -787,7 +750,6 @@ POST _ml/data_frame/_evaluate 1. We evaluate only the documents that are not part of the training data. - The returned confusion matrix shows us how many data points were classified correctly (where the `actual_class` matches the `predicted_class`) and how many were misclassified (`actual_class` does not match `predicted_class`): ```console-result @@ -837,13 +799,10 @@ The returned confusion matrix shows us how many data points were classified corr 3. The name of the predicted class. 4. The number of documents that belong to the actual class and are labeled as the predicted class. - :::: - If you don’t want to keep the {{dfanalytics-job}}, you can delete it in {{kib}} or by using the [delete {{dfanalytics-job}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/delete-dfanalytics.html). When you delete {{dfanalytics-jobs}} in {{kib}}, you have the option to also remove the destination indices and {{data-sources}}. - ### Further readings [dfa-classification-readings] * [{{classanalysis-cap}} example (Jupyter notebook)](https://github.com/elastic/examples/tree/master/Machine%20Learning/Analytics%20Jupyter%20Notebooks) diff --git a/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-finding-outliers.md b/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-finding-outliers.md index c3790234a2..b237b6c567 100644 --- a/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-finding-outliers.md +++ b/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-finding-outliers.md @@ -11,8 +11,6 @@ mapped_pages: {{oldetection-cap}} is a batch analysis, it runs against your data once. If new data comes into the index, you need to do the analysis again on the altered data. :::: - - ## {{oldetection-cap}} algorithms [dfa-outlier-algorithms] In the {{stack}}, we use an ensemble of four different distance and density based {{oldetection}} methods: @@ -26,44 +24,37 @@ You don’t need to select the methods or provide any parameters, but you can ov The four algorithms don’t always agree on which points are outliers. By default, {{oldetection}} jobs use all these methods, then normalize and combine their results and give every data point in the index an {{olscore}}. The {{olscore}} ranges from 0 to 1, where the higher number represents the chance that the data point is an outlier compared to the other data points in the index. - ### Feature influence [dfa-feature-influence] Feature influence – another score calculated while detecting outliers – provides a relative ranking of the different features and their contribution towards a point being an outlier. This score allows you to understand the context or the reasoning on why a certain data point is an outlier. - ## 1. Define the problem [dfa-outlier-detection-problem] {{oldetection-cap}} in the {{stack}} can be used to detect any unusual entity in a given population. For example, to detect malicious software on a machine or unusual user behavior on a network. As {{oldetection}} operates on the assumption that the outliers make up a small proportion of the overall data population, you can use this feature in such cases. {{oldetection-cap}} is a batch analysis that works best on an entity-centric index. If your use case is based on time series data, you might want to use [{{anomaly-detect}}](../anomaly-detection.md) instead. The {{ml-features}} provide unsupervised {{oldetection}}, which means there is no need to provide a training data set. - ## 2. Set up the environment [dfa-outlier-detection-environment] Before you can use the {{stack-ml-features}}, there are some configuration requirements (such as security privileges) that must be addressed. Refer to [Setup and security](../setting-up-machine-learning.md). - ## 3. Prepare and transform data [dfa-outlier-detection-prepare-data] {{oldetection-cap}} requires specifically structured source data: a two dimensional tabular data structure. For this reason, you might need to [{{transform}}](../../transforms.md) your data to create a {{dataframe}} which can be used as the source for {{oldetection}}. You can find an example of how to transform your data into an entity-centric index in [this section](#weblogs-outliers). - ## 4. Create a job [dfa-outlier-detection-create-job] -{{dfanalytics-jobs-cap}} contain the configuration information and metadata necessary to perform an analytics task. You can create {{dfanalytics-jobs}} via {{kib}} or using the [create {{dfanalytics-jobs}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-dfanalytics.html). Select {{oldetection}} as the analytics type that the {{dfanalytics-job}} performs. You can also decide to include and exclude fields to/from the analysis when you create the job. +{{dfanalytics-cap}} jobs contain the configuration information and metadata necessary to perform an analytics task. You can create {{dfanalytics}} jobs via {{kib}} or using the [create {{dfanalytics}} jobs API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-dfanalytics.html). Select {{oldetection}} as the analytics type that the {{dfanalytics}} job performs. You can also decide to include and exclude fields to/from the analysis when you create the job. ::::{tip} You can view the statistics of the selectable fields in the {{dfanalytics}} wizard. The field statistics displayed in a flyout provide more meaningful context to help you select relevant fields. :::: - - ## 5. Start the job [dfa-outlier-detection-start] -You can start the job via {{kib}} or using the [start {{dfanalytics-jobs}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-dfanalytics.html) API. An {{oldetection}} job has four phases: +You can start the job via {{kib}} or using the [start {{dfanalytics}} job](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-dfanalytics.html) API. An {{oldetection}} job has four phases: * `reindexing`: documents are copied from the source index to the destination index. * `loading_data`: the job fetches the necessary data from the destination index. @@ -72,14 +63,13 @@ You can start the job via {{kib}} or using the [start {{dfanalytics-jobs}}](http After the last phase is finished, the job stops and the results are ready for evaluation. -{{oldetection-cap}} jobs – unlike other {{dfanalytics-jobs}} – run one time in their life cycle. If you’d like to run the analysis again, you need to create a new job. - +{{oldetection-cap}} jobs – unlike other {{dfanalytics}} jobs – run one time in their life cycle. If you’d like to run the analysis again, you need to create a new job. ## 6. Evaluate the results [ml-outlier-detection-evaluate] -Using the {{dfanalytics}} features to gain insights from a data set is an iterative process. After you defined the problem you want to solve, and chose the analytics type that can help you to do so, you need to produce a high-quality data set and create the appropriate {{dfanalytics-job}}. You might need to experiment with different configurations, parameters, and ways to transform data before you arrive at a result that satisfies your use case. A valuable companion to this process is the [{{evaluatedf-api}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/evaluate-dfanalytics.html), which enables you to evaluate the {{dfanalytics}} performance. It helps you understand error distributions and identifies the points where the {{dfanalytics}} model performs well or less trustworthily. +Using the {{dfanalytics}} features to gain insights from a data set is an iterative process. After you defined the problem you want to solve, and chose the analytics type that can help you to do so, you need to produce a high-quality data set and create the appropriate {{dfanalytics}} job. You might need to experiment with different configurations, parameters, and ways to transform data before you arrive at a result that satisfies your use case. A valuable companion to this process is the [evaluate {{dfanalytics}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/evaluate-dfanalytics.html), which enables you to evaluate the {{dfanalytics}} performance. It helps you understand error distributions and identifies the points where the {{dfanalytics}} model performs well or less trustworthily. -To evaluate the analysis with this API, you need to annotate your index that contains the results of the analysis with a field that marks each document with the ground truth. The {{evaluatedf-api}} evaluates the performance of the {{dfanalytics}} against this manually provided ground truth. +To evaluate the analysis with this API, you need to annotate your index that contains the results of the analysis with a field that marks each document with the ground truth. The evaluate {{dfanalytics}} API evaluates the performance of the {{dfanalytics}} against this manually provided ground truth. The {{oldetection}} evaluation type offers the following metrics to evaluate the model performance: @@ -88,7 +78,6 @@ The {{oldetection}} evaluation type offers the following metrics to evaluate the * recall * receiver operating characteristic (ROC) curve. - ### Confusion matrix [ml-dfanalytics-confusion-matrix] A confusion matrix provides four measures of how well the {{dfanalytics}} worked on your data set: @@ -98,10 +87,9 @@ A confusion matrix provides four measures of how well the {{dfanalytics}} worked * False positives (FP): Not class members that the analysis misidentified as class members. * False negatives (FN): Class members that the analysis misidentified as not class members. -Although, the {{evaluatedf-api}} can compute the confusion matrix out of the analysis results, these results are not binary values (class member/not class member), but a number between 0 and 1 (which called the {{olscore}} in case of {{oldetection}}). This value captures how likely it is for a data point to be a member of a certain class. It means that it is up to the user to decide what is the threshold or cutoff point at which the data point will be considered as a member of the given class. For example, the user can say that all the data points with an {{olscore}} higher than 0.5 will be considered as outliers. - -To take this complexity into account, the {{evaluatedf-api}} returns the confusion matrix at different thresholds (by default, 0.25, 0.5, and 0.75). +Although, the evaluate {{dfanalytics}} API can compute the confusion matrix out of the analysis results, these results are not binary values (class member/not class member), but a number between 0 and 1 (which called the {{olscore}} in case of {{oldetection}}). This value captures how likely it is for a data point to be a member of a certain class. It means that it is up to the user to decide what is the threshold or cutoff point at which the data point will be considered as a member of the given class. For example, the user can say that all the data points with an {{olscore}} higher than 0.5 will be considered as outliers. +To take this complexity into account, the evaluate {{dfanalytics}} API returns the confusion matrix at different thresholds (by default, 0.25, 0.5, and 0.75). ### Precision and recall [ml-dfanalytics-precision-recall] @@ -113,21 +101,17 @@ Recall shows how many of the data points that are actual class members were iden Precision and recall are computed at different threshold levels. - ### Receiver operating characteristic curve [ml-dfanalytics-roc] The receiver operating characteristic (ROC) curve is a plot that represents the performance of the binary classification process at different thresholds. It compares the rate of true positives against the rate of false positives at the different threshold levels to create the curve. From this plot, you can compute the area under the curve (AUC) value, which is a number between 0 and 1. The closer to 1, the better the algorithm performance. -The {{evaluatedf-api}} can return the false positive rate (`fpr`) and the true positive rate (`tpr`) at the different threshold levels, so you can visualize the algorithm performance by using these values. - +The evaluate {{dfanalytics}} API can return the false positive rate (`fpr`) and the true positive rate (`tpr`) at the different threshold levels, so you can visualize the algorithm performance by using these values. ## Detecting unusual behavior in the logs data set [weblogs-outliers] The goal of {{oldetection}} is to find the most unusual documents in an index. Let’s try to detect unusual behavior in the [data logs sample data set](../../overview/kibana-quickstart.md#gs-get-data-into-kibana). -1. Verify that your environment is set up properly to use {{ml-features}}. If the {{es}} {security-features} are enabled, you need a user that has authority to create and manage {{dfanalytics-jobs}}. See [Setup and security](../setting-up-machine-learning.md). - - Since we’ll be creating {{transforms}}, you also need `manage_data_frame_transforms` cluster privileges. +1. Verify that your environment is set up properly to use {{ml-features}}. If the {{es}} {{security-features}} are enabled, you need a user that has authority to create and manage {{dfanalytics}} jobs. See [Setup and security](../setting-up-machine-learning.md). Since we’ll be creating {{transforms}}, you also need `manage_data_frame_transforms` cluster privileges. 2. Create a {{transform}} that generates an entity-centric index with numeric or boolean data to analyze. @@ -137,16 +121,17 @@ The goal of {{oldetection}} is to find the most unusual documents in an index. L You can preview the {{transform}} before you create it in **{{stack-manage-app}}** > **Transforms**: - :::{image} ../../../images/machine-learning-logs-transform-preview.jpg - :alt: Creating a {{transform}} in {kib} - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-logs-transform-preview.jpg +:alt: Creating a {{transform}} in {kib} +:class: screenshot +::: Alternatively, you can use the [preview {{transform}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/preview-transform.html) and the [create {{transform}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-transform.html). - ::::{dropdown} API example - ```console - POST _transform/_preview +::::{dropdown} API example + +```console +POST _transform/_preview { "source": { "index": [ @@ -229,52 +214,50 @@ The goal of {{oldetection}} is to find the most unusual documents in an index. L "index": "weblog-clientip" } } - ``` - - :::: +``` +:::: For more details about creating {{transforms}}, see [Transforming the eCommerce sample data](../../transforms/ecommerce-transforms.md). 3. Start the {{transform}}. - ::::{tip} - Even though resource utilization is automatically adjusted based on the cluster load, a {{transform}} increases search and indexing load on your cluster while it runs. If you’re experiencing an excessive load, however, you can stop it. - :::: - +::::{tip} +Even though resource utilization is automatically adjusted based on the cluster load, a {{transform}} increases search and indexing load on your cluster while it runs. If you’re experiencing an excessive load, however, you can stop it. +:::: You can start, stop, and manage {{transforms}} in {{kib}}. Alternatively, you can use the [start {{transforms}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-data-frame-transform.html) API. - ::::{dropdown} API example - ```console - POST _transform/logs-by-clientip/_start - ``` +::::{dropdown} API example +```console +POST _transform/logs-by-clientip/_start +``` - :::: +:::: 4. Create a {{dfanalytics-job}} to detect outliers in the new entity-centric index. In the wizard on the **Machine Learning** > **Data Frame Analytics** page in {{kib}}, select your new {{data-source}} then use the default values for {{oldetection}}. For example: - :::{image} ../../../images/machine-learning-weblog-outlier-job-1.jpg - :alt: Create a {{dfanalytics-job}} in {kib} - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-weblog-outlier-job-1.jpg +:alt: Create a {{dfanalytics-job}} in {kib} +:class: screenshot +::: The wizard includes a scatterplot matrix, which enables you to explore the relationships between the fields. You can use that information to help you decide which fields to include or exclude from the analysis. - :::{image} ../../../images/machine-learning-weblog-outlier-scatterplot.jpg - :alt: A scatterplot matrix for three fields in {kib} - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-weblog-outlier-scatterplot.jpg +:alt: A scatterplot matrix for three fields in {kib} +:class: screenshot +::: If you want these charts to represent data from a larger sample size or from a randomized selection of documents, you can change the default behavior. However, a larger sample size might slow down the performance of the matrix and a randomized selection might put more load on the cluster due to the more intensive query. - Alternatively, you can use the [create {{dfanalytics-jobs}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-dfanalytics.html). + Alternatively, you can use the [create {{dfanalytics}} jobs API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-dfanalytics.html). - ::::{dropdown} API example - ```console - PUT _ml/data_frame/analytics/weblog-outliers +::::{dropdown} API example +```console +PUT _ml/data_frame/analytics/weblog-outliers { "source": { "index": "weblog-clientip" @@ -290,50 +273,50 @@ The goal of {{oldetection}} is to find the most unusual documents in an index. L "includes" : ["@timestamp.value_count","bytes.max","bytes.sum","request.value_count"] } } - ``` +``` - :::: +:::: After you configured your job, the configuration details are automatically validated. If the checks are successful, you can proceed and start the job. A warning message is shown if the configuration is invalid. The message contains a suggestion to improve the configuration to be validated. -5. Start the {{dfanalytics-job}}. +5. Start the {{dfanalytics}} job. - You can start, stop, and manage {{dfanalytics-jobs}} on the **Machine Learning** > **Data Frame Analytics** page. Alternatively, you can use the [start {{dfanalytics-jobs}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-dfanalytics.html) and [stop {{dfanalytics-jobs}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/stop-dfanalytics.html) APIs. + You can start, stop, and manage {{dfanalytics-jobs}} on the **Machine Learning** > **Data Frame Analytics** page. Alternatively, you can use the [start {{dfanalytics}} jobs](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-dfanalytics.html) and [stop {{dfanalytics}} jobs](https://www.elastic.co/guide/en/elasticsearch/reference/current/stop-dfanalytics.html) APIs. - ::::{dropdown} API example - ```console +::::{dropdown} API example +```console POST _ml/data_frame/analytics/weblog-outliers/_start - ``` +``` - :::: +:::: 6. View the results of the {{oldetection}} analysis. - The {{dfanalytics-job}} creates an index that contains the original data and {{olscores}} for each document. The {{olscore}} indicates how different each entity is from other entities. + The {{dfanalytics}} job creates an index that contains the original data and {{olscores}} for each document. The {{olscore}} indicates how different each entity is from other entities. - In {{kib}}, you can view the results from the {{dfanalytics-job}} and sort them on the outlier score: + In {{kib}}, you can view the results from the {{dfanalytics}} job and sort them on the outlier score: - :::{image} ../../../images/machine-learning-outliers.jpg - :alt: View {{oldetection}} results in {kib} - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-outliers.jpg +:alt: View {{oldetection}} results in {kib} +:class: screenshot +::: The `ml.outlier` score is a value between 0 and 1. The larger the value, the more likely they are to be an outlier. In {{kib}}, you can optionally enable histogram charts to get a better understanding of the distribution of values for each column in the result. In addition to an overall outlier score, each document is annotated with feature influence values for each field. These values add up to 1 and indicate which fields are the most important in deciding whether an entity is an outlier or inlier. For example, the dark shading on the `bytes.sum` field for the client IP `111.237.144.54` indicates that the sum of the exchanged bytes was the most influential feature in determining that that client IP is an outlier. - If you want to see the exact feature influence values, you can retrieve them from the index that is associated with your {{dfanalytics-job}}. + If you want to see the exact feature influence values, you can retrieve them from the index that is associated with your {{dfanalytics}} job. - ::::{dropdown} API example - ```console - GET weblog-outliers/_search?q="111.237.144.54" - ``` +::::{dropdown} API example +```console +GET weblog-outliers/_search?q="111.237.144.54" +``` The search results include the following {{oldetection}} scores: - ```js - ... +```js + ... "ml" : { "outlier_score" : 0.9830020666122437, "feature_influence" : [ @@ -355,8 +338,8 @@ The goal of {{oldetection}} is to find the most unusual documents in an index. L } ] } - ... - ``` + ... +``` :::: @@ -370,18 +353,14 @@ The goal of {{oldetection}} is to find the most unusual documents in an index. L You can highlight an area in one of the charts and the corresponding area is also highlighted in the rest of the charts. This function makes it easier to focus on specific values and areas in the results. In addition to the sample size and random scoring options, there is a **Dynamic size** option. If you enable this option, the size of each point is affected by its {{olscore}}; that is to say, the largest points have the highest {{olscores}}. The goal of these charts and options is to help you visualize and explore the outliers within your data. - Now that you’ve found unusual behavior in the sample data set, consider how you might apply these steps to other data sets. If you have data that is already marked up with true outliers, you can determine how well the {{oldetection}} algorithms perform by using the evaluate {{dfanalytics}} API. See [6. Evaluate the results](#ml-outlier-detection-evaluate). ::::{tip} -If you do not want to keep the {{transform}} and the {{dfanalytics-job}}, you can delete them in {{kib}} or use the [delete {{transform}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/delete-data-frame-transform.html) and [delete {{dfanalytics-job}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/delete-dfanalytics.html). When you delete {{transforms}} and {{dfanalytics-jobs}} in {{kib}}, you have the option to also remove the destination indices and {{data-sources}}. +If you do not want to keep the {{transform}} and the {{dfanalytics}} job, you can delete them in {{kib}} or use the [delete {{transform}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/delete-data-frame-transform.html) and [delete {{dfanalytics}} job API](https://www.elastic.co/guide/en/elasticsearch/reference/current/delete-dfanalytics.html). When you delete {{transforms}} and {{dfanalytics}} jobs in {{kib}}, you have the option to also remove the destination indices and {{data-sources}}. :::: - - ## Further reading [outlier-detection-reading] * If you want to see another example of {{oldetection}} in a Jupyter notebook, [click here](https://github.com/elastic/examples/tree/master/Machine%20Learning/Outlier%20Detection/Introduction). * [This blog post](https://www.elastic.co/blog/catching-malware-with-elastic-outlier-detection) shows you how to catch malware using {{oldetection}}. * [Benchmarking {{oldetection}} results in Elastic {{ml}}](https://www.elastic.co/blog/benchmarking-outlier-detection-in-elastic-machine-learning) - diff --git a/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-overview.md b/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-overview.md index 3b09a9e77f..a31b6f6fda 100644 --- a/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-overview.md +++ b/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-overview.md @@ -4,16 +4,13 @@ mapped_pages: - https://www.elastic.co/guide/en/machine-learning/current/ml-dfa-overview.html --- - - # Overview [ml-dfa-overview] - {{dfanalytics-cap}} enable you to perform different analyses of your data and annotate it with the results. By doing this, it provides additional insights into the data. [{{oldetection-cap}}](ml-dfa-finding-outliers.md) identifies unusual data points in the data set. [{{regression-cap}}](ml-dfa-regression.md) makes predictions on your data after it determines certain relationships among your data points. [{{classification-cap}}](ml-dfa-classification.md) predicts the class or category of a given data point in a data set. {{infer-cap}} enables you to use trained {{ml}} models against incoming data in a continuous fashion. -The process leaves the source index intact, it creates a new index that contains a copy of the source data and the annotated data. You can slice and dice the data extended with the results as you normally do with any other data set. Read [How {{dfanalytics-jobs}} work](ml-dfa-phases.md) for more information. +The process leaves the source index intact, it creates a new index that contains a copy of the source data and the annotated data. You can slice and dice the data extended with the results as you normally do with any other data set. Read [How {{dfanalytics}} jobs work](ml-dfa-phases.md) for more information. -You can evaluate the {{dfanalytics}} performance by using the {{evaluatedf-api}} against a marked up data set. It helps you understand error distributions and identifies the points where the {{dfanalytics}} model performs well or less trustworthily. +You can evaluate the {{dfanalytics}} performance by using the evaluate {{dfanalytics}} API against a marked up data set. It helps you understand error distributions and identifies the points where the {{dfanalytics}} model performs well or less trustworthily. Consult [Introduction to supervised learning](#ml-supervised-workflow) to learn more about how to make predictions with supervised learning. @@ -23,7 +20,6 @@ Consult [Introduction to supervised learning](#ml-supervised-workflow) to learn | {{regression}} | supervised | | {{classification}} | supervised | - ## Introduction to supervised learning [ml-supervised-workflow] Elastic supervised learning enables you to train a {{ml}} model based on training examples that you provide. You can then use your model to make predictions on new data. This page summarizes the end-to-end workflow for training, evaluating and deploying a model. It gives a high-level overview of the steps required to identify and implement a solution using supervised learning. @@ -36,7 +32,6 @@ The workflow for supervised learning consists of the following stages: These are iterative stages, meaning that after evaluating each step, you might need to make adjustments before you move further. - ### Define the problem [define-problem] It’s important to take a moment and think about where {{ml}} can be most impactful. Consider what type of data you have available and what value it holds. The better you know the data, the quicker you will be able to create {{ml}} models that generate useful insights. What kinds of patterns do you want to discover in your data? What type of value do you want to predict: a category, or a numerical value? The answers help you choose the type of analysis that fits your use case. @@ -48,7 +43,6 @@ After you identify the problem, consider which of the {{ml-features}} are most l * {{regression}}: predicts **continuous, numerical values** like the response time of a web request. * {{classification}}: predicts **discrete, categorical values** like whether a [DNS request originates from a malicious or benign domain](https://www.elastic.co/blog/machine-learning-in-cybersecurity-training-supervised-models-to-detect-dga-activity). - ### Prepare and transform data [prepare-transform-data] You have defined the problem and selected an appropriate type of analysis. The next step is to produce a high-quality data set in {{es}} with a clear relationship to your training objectives. If your data is not already in {{es}}, this is the stage where you develop your data pipeline. If you want to learn more about how to ingest data into {{es}}, refer to the [Ingest node documentation](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md). @@ -61,7 +55,6 @@ Before you train the model, consider preprocessing the data. In practice, the ty {{regression-cap}} and {{classification}} require specifically structured source data: a two dimensional tabular data structure. For this reason, you might need to [{{transform}}](../../transforms.md) your data to create a {{dataframe}} which can be used as the source for these types of {{dfanalytics}}. - ### Train, test, iterate [train-test-iterate] After your data is prepared and transformed into the right format, it is time to train the model. Training is an iterative process — every iteration is followed by an evaluation to see how the model performs. @@ -74,14 +67,12 @@ During the training process, the training data is fed through the learning algor Once the model is trained, you can evaluate how well it predicts previously unseen data with the model generalization error. There are further evaluation types for both {{regression}} and {{classification}} analysis which provide metrics about training performance. When you are satisfied with the results, you are ready to deploy the model. Otherwise, you may want to adjust the training configuration or consider alternative ways to preprocess and represent your data. - ### Deploy model [deploy-model] You have trained the model and are satisfied with the performance. The last step is to deploy your trained model and start using it on new data. The Elastic {{ml}} feature called {{infer}} enables you to make predictions for new data either by using it as a processor in an ingest pipeline, in a continuous {{transform}} or as an aggregation at search time. When new data comes into your ingest pipeline or you run a search on your data with an {{infer}} aggregation, the model is used to infer against the data and make predictions on it. - ### Next steps [next-steps] * Read more about how to [transform you data](../../transforms.md) into an entity-centric index. diff --git a/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-regression.md b/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-regression.md index 1acbe5f28d..6dab15d005 100644 --- a/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-regression.md +++ b/explore-analyze/machine-learning/data-frame-analytics/ml-dfa-regression.md @@ -9,24 +9,20 @@ mapped_pages: When you perform {{reganalysis}}, you must identify a subset of fields that you want to use to create a model for predicting other fields. *Feature variables* are the fields that are used to create the model. The *dependent variable* is the field you want to predict. - ## {{regression-cap}} algorithms [dfa-regression-algorithm] {{regression-cap}} uses an ensemble learning technique that is similar to extreme gradient boosting (XGBoost) which combines decision trees with gradient boosting methodologies. XGBoost trains a sequence of decision trees and every decision tree learns from the mistakes of the forest so far. In each iteration, the trees added to the forest improve the decision quality of the combined decision forest. By default, the regression algorithm optimizes for a [loss function](dfa-regression-lossfunction.md) called mean-squared error loss. There are three types of {{feature-vars}} that you can use with these algorithms: numerical, categorical, or Boolean. Arrays are not supported. - ## 1. Define the problem [dfa-regression-problem] {{regression-cap}} can be useful in cases where a continuous quantity needs to be predicted. The values that {{reganalysis}} can predict are numerical values. If your use case requires predicting continuous, numerical values, then {{regression}} might be the suitable choice for you. - ## 2. Set up the environment [dfa-regression-environment] Before you can use the {{stack-ml-features}}, there are some configuration requirements (such as security privileges) that must be addressed. Refer to [Setup and security](../setting-up-machine-learning.md). - ## 3. Prepare and transform data [dfa-regression-prepare-data] {{regression-cap}} is a supervised {{ml}} method, which means you need to supply a labeled training data set. This data set must have values for the {{feature-vars}} and the {{depvar}} which are used to train the model. This information is used during training to identify relationships among the various characteristics of the data and the predicted value. This labeled data set also plays a critical role in model evaluation. @@ -35,10 +31,9 @@ You might also need to [{{transform}}](../../transforms.md) your data to create To learn more about how to prepare your data, refer to [the relevant section](ml-dfa-overview.md#prepare-transform-data) of the supervised learning overview. - ## 4. Create a job [dfa-regression-create-job] -{{dfanalytics-jobs-cap}} contain the configuration information and metadata necessary to perform an analytics task. You can create {{dfanalytics-jobs}} via {{kib}} or using the [create {{dfanalytics-jobs}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-dfanalytics.html). +{{dfanalytics-cap}} jobs contain the configuration information and metadata necessary to perform an analytics task. You can create {{dfanalytics}} jobs via {{kib}} or using the [create {{dfanalytics}} jobs API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-dfanalytics.html). Select {{regression}} as the analytics type for the job, then select the field that you want to predict (the {{depvar}}). You can also include and exclude fields to/from the analysis. @@ -46,11 +41,9 @@ Select {{regression}} as the analytics type for the job, then select the field t You can view the statistics of the selectable fields in the {{dfanalytics}} wizard. The field statistics displayed in a flyout provide more meaningful context to help you select relevant fields. :::: - - ## 5. Start the job [dfa-regression-start] -You can start the job via {{kib}} or using the [start {{dfanalytics-jobs}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-dfanalytics.html) API. A {{regression}} job has the following phases: +You can start the job via {{kib}} or using the [start {{dfanalytics}} jobs](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-dfanalytics.html) API. A {{regression}} job has the following phases: * `reindexing`: Documents are copied from the source index to the destination index. * `loading_data`: The job fetches the necessary data from the destination index. @@ -67,11 +60,9 @@ After the last phase is finished, the job stops and the results are ready for ev When you create a {{dfanalytics-job}}, the inference step of the process might fail if the model is too large to fit into JVM. For a workaround, refer to [this GitHub issue](https://github.com/elastic/elasticsearch/issues/76093). :::: - - ## 6. Evaluate the result [ml-dfanalytics-regression-evaluation] -Using the {{dfanalytics}} features to gain insights from a data set is an iterative process. After you defined the problem you want to solve, and chose the analytics type that can help you to do so, you need to produce a high-quality data set and create the appropriate {{dfanalytics-job}}. You might need to experiment with different configurations, parameters, and ways to transform data before you arrive at a result that satisfies your use case. A valuable companion to this process is the [{{evaluatedf-api}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/evaluate-dfanalytics.html), which enables you to evaluate the {{dfanalytics}} performance. It helps you understand error distributions and identifies the points where the {{dfanalytics}} model performs well or less trustworthily. +Using the {{dfanalytics}} features to gain insights from a data set is an iterative process. After you defined the problem you want to solve, and chose the analytics type that can help you to do so, you need to produce a high-quality data set and create the appropriate {{dfanalytics}} job. You might need to experiment with different configurations, parameters, and ways to transform data before you arrive at a result that satisfies your use case. A valuable companion to this process is the [evaluate {{dfanalytics}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/evaluate-dfanalytics.html), which enables you to evaluate the {{dfanalytics}} performance. It helps you understand error distributions and identifies the points where the {{dfanalytics}} model performs well or less trustworthily. To evaluate the analysis with this API, you need to annotate your index that contains the results of the analysis with a field that marks each document with the ground truth. The {{evaluatedf-api}} evaluates the performance of the {{dfanalytics}} against this manually provided ground truth. @@ -84,34 +75,28 @@ The {{regression}} evaluation type offers the following metrics to evaluate the * Mean squared error (MSE) * Mean squared logarithmic error (MSLE) * Pseudo-Huber loss -* R-squared (R2) - +* R-squared (R^2^) ### Mean squared error [ml-dfanalytics-mse] MSE is the average squared sum of the difference between the true value and the predicted value. (Avg (predicted value-actual value)2). - ### Mean squared logarithmic error [ml-dfanalytics-msle] MSLE is a variation of mean squared error. It can be used for cases when the target values are positive and distributed with a long tail such as data on prices or population. Consult the [Loss functions for {{regression}} analyses](dfa-regression-lossfunction.md) page to learn more about loss functions. - ### Pseudo-Huber loss [ml-dfanalytics-huber] [Pseudo-Huber loss metric](https://en.wikipedia.org/wiki/Huber_loss#Pseudo-Huber_loss_function) behaves as mean absolute error (MAE) for errors larger than a predefined value (defaults to `1`) and as mean squared error (MSE) for errors smaller than the predefined value. This loss function uses the `delta` parameter to define the transition point between MAE and MSE. Consult the [Loss functions for {{regression}} analyses](dfa-regression-lossfunction.md) page to learn more about loss functions. - ### R-squared [ml-dfanalytics-r-squared] -R-squared (R2) represents the goodness of fit and measures how much of the variation in the data the predictions are able to explain. The value of R2 are less than or equal to 1, where 1 indicates that the predictions and true values are equal. A value of 0 is obtained when all the predictions are set to the mean of the true values. A value of 0.5 for R2 would indicate that the predictions are 1 - 0.5(1/2) (about 30%) closer to true values than their mean. - +R-squared (R^2^) represents the goodness of fit and measures how much of the variation in the data the predictions are able to explain. The value of R^2^ are less than or equal to 1, where 1 indicates that the predictions and true values are equal. A value of 0 is obtained when all the predictions are set to the mean of the true values. A value of 0.5 for R^2^ would indicate that the predictions are 1 - 0.5 ^(1/2)^ (about 30%) closer to true values than their mean. ### {{feat-imp-cap}} [dfa-regression-feature-importance] {{feat-imp-cap}} provides further information about the results of an analysis and helps to interpret the results in a more subtle way. If you want to learn more about {{feat-imp}}, [click here](ml-feature-importance.md). - ## 7. Deploy the model [dfa-regression-deploy] The model that you created is stored as {{es}} documents in internal indices. In other words, the characteristics of your trained model are saved and ready to be deployed and used as functions. The [{{infer}}](#ml-inference-reg) feature enables you to use your model in a preprocessor of an ingest pipeline or in a pipeline aggregation of a search query to make predictions about your data. @@ -119,24 +104,24 @@ The model that you created is stored as {{es}} documents in internal indices. In 1. To deploy {{dfanalytics}} model in a pipeline, navigate to **Machine Learning** > **Model Management** > **Trained models** in the main menu, or use the [global search field](../../overview/kibana-quickstart.md#_finding_your_apps_and_objects) in {{kib}}. 2. Find the model you want to deploy in the list and click **Deploy model** in the **Actions** menu. - :::{image} ../../../images/machine-learning-ml-dfa-trained-models-ui.png - :alt: The trained models UI in {kib} - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-ml-dfa-trained-models-ui.png +:alt: The trained models UI in {kib} +:class: screenshot +::: 3. Create an {{infer}} pipeline to be able to use the model against new data through the pipeline. Add a name and a description or use the default values. - :::{image} ../../../images/machine-learning-ml-dfa-inference-pipeline.png - :alt: Creating an inference pipeline - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-ml-dfa-inference-pipeline.png +:alt: Creating an inference pipeline +:class: screenshot +::: 4. Configure the pipeline processors or use the default settings. - :::{image} ../../../images/machine-learning-ml-dfa-inference-processor.png - :alt: Configuring an inference processor - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-ml-dfa-inference-processor.png +:alt: Configuring an inference processor +:class: screenshot +::: 5. Configure to handle ingest failures or use the default settings. 6. (Optional) Test your pipeline by running a simulation of the pipeline to confirm it produces the anticipated results. @@ -144,20 +129,17 @@ The model that you created is stored as {{es}} documents in internal indices. In The model is deployed and ready to use through the {{infer}} pipeline. - ### {{infer-cap}} [ml-inference-reg] {{infer-cap}} enables you to use [trained {{ml}} models](ml-trained-models.md) against incoming data in a continuous fashion. For instance, suppose you have an online service and you would like to predict whether a customer is likely to churn. You have an index with historical data – information on the customer behavior throughout the years in your business – and a {{classification}} model that is trained on this data. The new information comes into a destination index of a {{ctransform}}. With {{infer}}, you can perform the {{classanalysis}} against the new data with the same input fields that you’ve trained the model on, and get a prediction. - #### {{infer-cap}} processor [ml-inference-processor-reg] {{infer-cap}} can be used as a processor specified in an [ingest pipeline](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md). It uses a trained model to infer against the data that is being ingested in the pipeline. The model is used on the ingest node. {{infer-cap}} pre-processes the data by using the model and provides a prediction. After the process, the pipeline continues executing (if there is any other processor in the pipeline), finally the new data together with the results are indexed into the destination index. -Check the [{{infer}} processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-processor.html) and [the {{ml}} {dfanalytics} API documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-df-analytics-apis.html) to learn more. - +Check the [{{infer}} processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-processor.html) and [the {{ml}} {{dfanalytics}} API documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-df-analytics-apis.html) to learn more. #### {{infer-cap}} aggregation [ml-inference-aggregation-reg] @@ -169,13 +151,10 @@ Check the [{{infer}} bucket aggregation](https://www.elastic.co/guide/en/elastic If you use trained model aliases to reference your trained model in an {{infer}} processor or {{infer}} aggregation, you can replace your trained model with a new one without the need of updating the processor or the aggregation. Reassign the alias you used to a new trained model ID by using the [Create or update trained model aliases API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-trained-models-aliases.html). The new trained model needs to use the same type of {{dfanalytics}} as the old one. :::: - - ## Performing {{reganalysis}} in the sample flight data set [performing-regression] Let’s try to predict flight delays by using the [sample flight data](../../overview/kibana-quickstart.md#gs-get-data-into-kibana). The data set contains information such as weather conditions, flight destinations and origins, flight distances, carriers, and the number of minutes each flight was delayed. When you create a {{regression}} job, it learns the relationships between the fields in your data to predict the value of a *{{depvar}}*, which - in this case - is the numeric `FlightDelayMins` field. For an overview of these concepts, see [*Predicting numerical values with {{regression}}*]() and [Introduction to supervised learning](ml-dfa-overview.md#ml-supervised-workflow). - ### Preparing your data [flightdata-regression-data] Each document in the data set contains details for a single flight, so this data is ready for analysis; it is already in a two-dimensional entity-based data structure. In general, you often need to [transform](../../transforms.md) the data into an entity-centric index before you analyze it. @@ -232,13 +211,10 @@ To be analyzed, a document must contain at least one field with a supported data :::: - ::::{note} The sample flight data is used in this example because it is easily accessible. However, the data contains some inconsistencies. For example, a flight can be both delayed and canceled. This is a good reminder that the quality of your input data affects the quality of your results. :::: - - ### Creating a {{regression}} model [flightdata-regression-model] To predict the number of minutes delayed for each flight: @@ -248,10 +224,10 @@ To predict the number of minutes delayed for each flight: You can use the wizard on the **{{ml-app}}** > **Data Frame Analytics** tab in {{kib}} or the [create {{dfanalytics-jobs}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-dfanalytics.html) API. - :::{image} ../../../images/machine-learning-flights-regression-job-1.jpg - :alt: Creating a {{dfanalytics-job}} in {kib} - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-flights-regression-job-1.jpg +:alt: Creating a {{dfanalytics-job}} in {kib} +:class: screenshot +::: 1. Choose `kibana_sample_data_flights` as the source index. 2. Choose `regression` as the job type. @@ -261,10 +237,10 @@ To predict the number of minutes delayed for each flight: The wizard includes a scatterplot matrix, which enables you to explore the relationships between the numeric fields. The color of each point is affected by the value of the {{depvar}} for that document, as shown in the legend. You can highlight an area in one of the charts and the corresponding area is also highlighted in the rest of the chart. You can use this matrix to help you decide which fields to include or exclude from the analysis. - :::{image} ../../../images/machine-learning-flightdata-regression-scatterplot.png - :alt: A scatterplot matrix for three fields in {kib} - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-flightdata-regression-scatterplot.png +:alt: A scatterplot matrix for three fields in {kib} +:class: screenshot +::: If you want these charts to represent data from a larger sample size or from a randomized selection of documents, you can change the default behavior. However, a larger sample size might slow down the performance of the matrix and a randomized selection might put more load on the cluster due to the more intensive query. @@ -274,9 +250,9 @@ To predict the number of minutes delayed for each flight: 9. Add a job ID (such as `model-flight-delay-regression`) and optionally a job description. 10. Add the name of the destination index that will contain the results of the analysis. In {{kib}}, the index name matches the job ID by default. It will contain a copy of the source index data where each document is annotated with the results. If the index does not exist, it will be created automatically. - ::::{dropdown} API example - ```console - PUT _ml/data_frame/analytics/model-flight-delays-regression +::::{dropdown} API example +```console +PUT _ml/data_frame/analytics/model-flight-delays-regression { "source": { "index": [ @@ -311,9 +287,9 @@ To predict the number of minutes delayed for each flight: ] } } - ``` +``` - :::: +:::: After you configured your job, the configuration details are automatically validated. If the checks are successful, you can proceed and start the job. A warning message is shown if the configuration is invalid. The message contains a suggestion to improve the configuration to be validated. @@ -322,30 +298,30 @@ To predict the number of minutes delayed for each flight: The job takes a few minutes to run. Runtime depends on the local hardware and also on the number of documents and fields that are analyzed. The more fields and documents, the longer the job runs. It stops automatically when the analysis is complete. - ::::{dropdown} API example - ```console - POST _ml/data_frame/analytics/model-flight-delays-regression/_start - ``` +::::{dropdown} API example +```console +POST _ml/data_frame/analytics/model-flight-delays-regression/_start +``` - :::: +:::: 4. Check the job stats to follow the progress in {{kib}} or use the [get {{dfanalytics-jobs}} statistics API](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-dfanalytics-stats.html). - :::{image} ../../../images/machine-learning-flights-regression-details.jpg - :alt: Statistics for a {{dfanalytics-job}} in {kib} - :class: screenshot - ::: +:::{image} ../../../images/machine-learning-flights-regression-details.jpg +:alt: Statistics for a {{dfanalytics-job}} in {kib} +:class: screenshot +::: When the job stops, the results are ready to view and evaluate. To learn more about the job phases, see [How {{dfanalytics-jobs}} work](ml-dfa-phases.md). - ::::{dropdown} API example - ```console - GET _ml/data_frame/analytics/model-flight-delays-regression/_stats - ``` +::::{dropdown} API example +```console +GET _ml/data_frame/analytics/model-flight-delays-regression/_stats +``` - The API call returns the following response: +The API call returns the following response: - ```console-result +```console-result { "count" : 1, "data_frame_analytics" : [ @@ -428,11 +404,9 @@ To predict the number of minutes delayed for each flight: } ] } - ``` - - :::: - +``` +:::: ### Viewing {{regression}} results [flightdata-regression-results] @@ -508,7 +482,6 @@ The snippet below shows an example of the total feature importance details in th 3. The minimum {{feat-imp}} value across all the training data for this field. 4. The maximum {{feat-imp}} value across all the training data for this field. - To see the top {{feat-imp}} values for each prediction, search the destination index. For example: ```console @@ -554,10 +527,8 @@ The snippet below shows a part of a document with the annotated results: :::: - Lastly, {{kib}} provides a scatterplot matrix in the results. It has the same functionality as the matrix that you saw in the job wizard. Its purpose is to likewise help you visualize and explore the relationships between the numeric fields and the {{depvar}} in your data. - ### Evaluating {{regression}} results [flightdata-regression-evaluate] Though you can look at individual results and compare the predicted value (`ml.FlightDelayMin_prediction`) to the actual value (`FlightDelayMins`), you typically need to evaluate the success of the {{regression}} model as a whole. @@ -654,15 +625,12 @@ POST _ml/data_frame/_evaluate 1. Evaluates only the documents that are not part of the training data. - :::: - When you have trained a satisfactory model, you can [deploy it](#dfa-regression-deploy) to make predictions about new data. If you don’t want to keep the {{dfanalytics-job}}, you can delete it. For example, use {{kib}} or the [delete {{dfanalytics-job}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/delete-dfanalytics.html). When you delete {{dfanalytics-jobs}} in {{kib}}, you have the option to also remove the destination indices and {{data-sources}}. - ## Further reading [dfa-regression-reading] * [Feature importance for {{dfanalytics}} (Jupyter notebook)](https://github.com/elastic/examples/tree/master/Machine%20Learning/Feature%20Importance) diff --git a/raw-migrated-files/kibana/kibana/xpack-ml-dfanalytics.md b/raw-migrated-files/kibana/kibana/xpack-ml-dfanalytics.md deleted file mode 100644 index 1c6302e3b8..0000000000 --- a/raw-migrated-files/kibana/kibana/xpack-ml-dfanalytics.md +++ /dev/null @@ -1,13 +0,0 @@ -# {{dfanalytics-cap}} [xpack-ml-dfanalytics] - -The Elastic {{ml}} {dfanalytics} feature enables you to analyze your data using {{classification}}, {{oldetection}}, and {{regression}} algorithms and generate new indices that contain the results alongside your source data. - -If you have a license that includes the {{ml-features}}, you can create {{dfanalytics-jobs}} and view their results on the **Data Frame Analytics** page in {{kib}}. For example: - -:::{image} ../../../images/kibana-classification.png -:alt: {{classification-cap}} results in {kib} -:class: screenshot -::: - -For more information about the {{dfanalytics}} feature, see [{{ml-cap}} {dfanalytics}](../../../explore-analyze/machine-learning/data-frame-analytics.md). - diff --git a/raw-migrated-files/stack-docs/machine-learning/index.md b/raw-migrated-files/stack-docs/machine-learning/index.md deleted file mode 100644 index f5a259a9f6..0000000000 --- a/raw-migrated-files/stack-docs/machine-learning/index.md +++ /dev/null @@ -1,3 +0,0 @@ -# Machine learning - -Migrated files from the Machine learning book. diff --git a/raw-migrated-files/stack-docs/machine-learning/ml-dfanalytics.md b/raw-migrated-files/stack-docs/machine-learning/ml-dfanalytics.md deleted file mode 100644 index 3632a6e798..0000000000 --- a/raw-migrated-files/stack-docs/machine-learning/ml-dfanalytics.md +++ /dev/null @@ -1,18 +0,0 @@ -# {{dfanalytics-cap}} [ml-dfanalytics] - -::::{important} -Using {{dfanalytics}} requires source data to be structured as a two dimensional "tabular" data structure, in other words a {{dataframe}}. [{{transforms-cap}}](../../../explore-analyze/transforms.md) enable you to create {{dataframes}} which can be used as the source for {{dfanalytics}}. -:::: - - -{{dfanalytics-cap}} enable you to perform different analyses of your data and annotate it with the results. Consult [Setup and security](../../../explore-analyze/machine-learning/setting-up-machine-learning.md) to learn more about the license and the security privileges that are required to use {{dfanalytics}}. - -* [Overview](../../../explore-analyze/machine-learning/data-frame-analytics/ml-dfa-overview.md) -* [*Finding outliers*](../../../explore-analyze/machine-learning/data-frame-analytics/ml-dfa-finding-outliers.md) -* [*Predicting numerical values with {{regression}}*](../../../explore-analyze/machine-learning/data-frame-analytics/ml-dfa-regression.md) -* [*Predicting classes with {{classification}}*](../../../explore-analyze/machine-learning/data-frame-analytics/ml-dfa-classification.md) -* [Language identification](https://www.elastic.co/guide/en/machine-learning/current/ml-dfa-lang-ident.html) -* [*Advanced concepts*](../../../explore-analyze/machine-learning/data-frame-analytics/ml-dfa-concepts.md) -* [*API quick reference*](../../../explore-analyze/machine-learning/data-frame-analytics/ml-dfanalytics-apis.md) -* [*Resources*](../../../explore-analyze/machine-learning/data-frame-analytics/ml-dfa-resources.md) - diff --git a/raw-migrated-files/toc.yml b/raw-migrated-files/toc.yml index ab854a0347..5cd90f0c93 100644 --- a/raw-migrated-files/toc.yml +++ b/raw-migrated-files/toc.yml @@ -718,7 +718,6 @@ toc: - file: kibana/kibana/using-kibana-with-security.md - file: kibana/kibana/watcher-ui.md - file: kibana/kibana/xpack-ml-aiops.md - - file: kibana/kibana/xpack-ml-dfanalytics.md - file: kibana/kibana/xpack-security-authorization.md - file: kibana/kibana/xpack-security-fips-140-2.md - file: kibana/kibana/xpack-security.md @@ -1016,9 +1015,6 @@ toc: - file: stack-docs/elastic-stack/upgrading-elastic-stack.md - file: stack-docs/elastic-stack/upgrading-elasticsearch.md - file: stack-docs/elastic-stack/upgrading-kibana.md - - file: stack-docs/machine-learning/index.md - children: - - file: stack-docs/machine-learning/ml-dfanalytics.md - file: tech-content/starting-with-the-elasticsearch-platform-and-its-solutions/index.md children: - file: tech-content/starting-with-the-elasticsearch-platform-and-its-solutions/get-elastic.md