Skip to content
12 changes: 6 additions & 6 deletions docset.yml
Original file line number Diff line number Diff line change
Expand Up @@ -370,10 +370,10 @@ subs:
dataframe-transforms-cap: "Transforms"
dfanalytics-cap: "Data frame analytics"
dfanalytics: "data frame analytics"
dataframe-analytics-config: "'{dataframe} analytics config'"
dfanalytics-job: "'{dataframe} analytics job'"
dfanalytics-jobs: "'{dataframe} analytics jobs'"
dfanalytics-jobs-cap: "'{dataframe-cap} analytics jobs'"
dataframe-analytics-config: "data frame analytics analytics config"
dfanalytics-job: "data frame analytics analytics job"
dfanalytics-jobs: "data frame analytics analytics jobs"
dfanalytics-jobs-cap: "Data frame analytics analytics jobs"
cdataframe: "continuous data frame"
cdataframes: "continuous data frames"
cdataframe-cap: "Continuous data frame"
Expand All @@ -390,8 +390,8 @@ subs:
olscore: "outlier score"
olscores: "outlier scores"
fiscore: "feature influence score"
evaluatedf-api: "evaluate {dataframe} analytics API"
evaluatedf-api-cap: "Evaluate {dataframe} analytics API"
evaluatedf-api: "evaluate data frame analytics API"
evaluatedf-api-cap: "Evaluate data frame analytics API"
binarysc: "binary soft classification"
binarysc-cap: "Binary soft classification"
regression: "regression"
Expand Down
17 changes: 12 additions & 5 deletions explore-analyze/machine-learning/data-frame-analytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,18 @@ mapped_urls:
- https://www.elastic.co/guide/en/kibana/current/xpack-ml-dfanalytics.html
---

# Data frame analytics
# Data frame analytics [ml-dfanalytics]

% What needs to be done: Lift-and-shift
::::{important}
Using {{dfanalytics}} requires source data to be structured as a two dimensional "tabular" data structure, in other words a {{dataframe}}. [{{transforms-cap}}](../transforms.md) enable you to create {{dataframes}} which can be used as the source for {{dfanalytics}}.
::::

% Use migrated content from existing pages that map to this page:
{{dfanalytics-cap}} enable you to perform different analyses of your data and annotate it with the results. Consult [Setup and security](setting-up-machine-learning.md) to learn more about the license and the security privileges that are required to use {{dfanalytics}}.

% - [ ] ./raw-migrated-files/stack-docs/machine-learning/ml-dfanalytics.md
% - [ ] ./raw-migrated-files/kibana/kibana/xpack-ml-dfanalytics.md
* [Overview](data-frame-analytics/ml-dfa-overview.md)
* [*Finding outliers*](data-frame-analytics/ml-dfa-finding-outliers.md)
* [*Predicting numerical values with {{regression}}*](data-frame-analytics/ml-dfa-regression.md)
* [*Predicting classes with {{classification}}*](data-frame-analytics/ml-dfa-classification.md)
* [*Advanced concepts*](data-frame-analytics/ml-dfa-concepts.md)
* [*API quick reference*](data-frame-analytics/ml-dfanalytics-apis.md)
* [*Resources*](data-frame-analytics/ml-dfa-resources.md)

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,13 @@ mapped_pages:
- https://www.elastic.co/guide/en/machine-learning/current/ml-dfa-overview.html
---



# Overview [ml-dfa-overview]


{{dfanalytics-cap}} enable you to perform different analyses of your data and annotate it with the results. By doing this, it provides additional insights into the data. [{{oldetection-cap}}](ml-dfa-finding-outliers.md) identifies unusual data points in the data set. [{{regression-cap}}](ml-dfa-regression.md) makes predictions on your data after it determines certain relationships among your data points. [{{classification-cap}}](ml-dfa-classification.md) predicts the class or category of a given data point in a data set. {{infer-cap}} enables you to use trained {{ml}} models against incoming data in a continuous fashion.

The process leaves the source index intact, it creates a new index that contains a copy of the source data and the annotated data. You can slice and dice the data extended with the results as you normally do with any other data set. Read [How {{dfanalytics-jobs}} work](ml-dfa-phases.md) for more information.
The process leaves the source index intact, it creates a new index that contains a copy of the source data and the annotated data. You can slice and dice the data extended with the results as you normally do with any other data set. Read [How {{dfanalytics}} jobs work](ml-dfa-phases.md) for more information.

You can evaluate the {{dfanalytics}} performance by using the {{evaluatedf-api}} against a marked up data set. It helps you understand error distributions and identifies the points where the {{dfanalytics}} model performs well or less trustworthily.
You can evaluate the {{dfanalytics}} performance by using the evaluate {{dfanalytics}} API against a marked up data set. It helps you understand error distributions and identifies the points where the {{dfanalytics}} model performs well or less trustworthily.

Consult [Introduction to supervised learning](#ml-supervised-workflow) to learn more about how to make predictions with supervised learning.

Expand All @@ -23,7 +20,6 @@ Consult [Introduction to supervised learning](#ml-supervised-workflow) to learn
| {{regression}} | supervised |
| {{classification}} | supervised |


## Introduction to supervised learning [ml-supervised-workflow]

Elastic supervised learning enables you to train a {{ml}} model based on training examples that you provide. You can then use your model to make predictions on new data. This page summarizes the end-to-end workflow for training, evaluating and deploying a model. It gives a high-level overview of the steps required to identify and implement a solution using supervised learning.
Expand All @@ -36,7 +32,6 @@ The workflow for supervised learning consists of the following stages:

These are iterative stages, meaning that after evaluating each step, you might need to make adjustments before you move further.


### Define the problem [define-problem]

It’s important to take a moment and think about where {{ml}} can be most impactful. Consider what type of data you have available and what value it holds. The better you know the data, the quicker you will be able to create {{ml}} models that generate useful insights. What kinds of patterns do you want to discover in your data? What type of value do you want to predict: a category, or a numerical value? The answers help you choose the type of analysis that fits your use case.
Expand All @@ -48,7 +43,6 @@ After you identify the problem, consider which of the {{ml-features}} are most l
* {{regression}}: predicts **continuous, numerical values** like the response time of a web request.
* {{classification}}: predicts **discrete, categorical values** like whether a [DNS request originates from a malicious or benign domain](https://www.elastic.co/blog/machine-learning-in-cybersecurity-training-supervised-models-to-detect-dga-activity).


### Prepare and transform data [prepare-transform-data]

You have defined the problem and selected an appropriate type of analysis. The next step is to produce a high-quality data set in {{es}} with a clear relationship to your training objectives. If your data is not already in {{es}}, this is the stage where you develop your data pipeline. If you want to learn more about how to ingest data into {{es}}, refer to the [Ingest node documentation](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md).
Expand All @@ -61,7 +55,6 @@ Before you train the model, consider preprocessing the data. In practice, the ty

{{regression-cap}} and {{classification}} require specifically structured source data: a two dimensional tabular data structure. For this reason, you might need to [{{transform}}](../../transforms.md) your data to create a {{dataframe}} which can be used as the source for these types of {{dfanalytics}}.


### Train, test, iterate [train-test-iterate]

After your data is prepared and transformed into the right format, it is time to train the model. Training is an iterative process — every iteration is followed by an evaluation to see how the model performs.
Expand All @@ -74,14 +67,12 @@ During the training process, the training data is fed through the learning algor

Once the model is trained, you can evaluate how well it predicts previously unseen data with the model generalization error. There are further evaluation types for both {{regression}} and {{classification}} analysis which provide metrics about training performance. When you are satisfied with the results, you are ready to deploy the model. Otherwise, you may want to adjust the training configuration or consider alternative ways to preprocess and represent your data.


### Deploy model [deploy-model]

You have trained the model and are satisfied with the performance. The last step is to deploy your trained model and start using it on new data.

The Elastic {{ml}} feature called {{infer}} enables you to make predictions for new data either by using it as a processor in an ingest pipeline, in a continuous {{transform}} or as an aggregation at search time. When new data comes into your ingest pipeline or you run a search on your data with an {{infer}} aggregation, the model is used to infer against the data and make predictions on it.


### Next steps [next-steps]

* Read more about how to [transform you data](../../transforms.md) into an entity-centric index.
Expand Down
Loading
Loading