Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ Using the example text "Elastic is headquartered in Mountain View, California.",

## Add the NER model to an {{infer}} ingest pipeline [ex-ner-ingest]

You can perform bulk {{infer}} on documents as they are ingested by using an [{{infer}} processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-processor.html) in your ingest pipeline. The novel *Les Misérables* by Victor Hugo is used as an example for {{infer}} in the following example. [Download](https://github.com/elastic/stack-docs/blob/8.5/docs/en/stack/ml/nlp/data/les-miserables-nd.json) the novel text split by paragraph as a JSON file, then upload it by using the [Data Visualizer](../../../manage-data/ingest.md#upload-data-kibana). Give the new index the name `les-miserables` when uploading the file.
You can perform bulk {{infer}} on documents as they are ingested by using an [{{infer}} processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-processor.html) in your ingest pipeline. The novel *Les Misérables* by Victor Hugo is used as an example for {{infer}} in the following example. [Download](https://github.com/elastic/stack-docs/blob/8.5/docs/en/stack/ml/nlp/data/les-miserables-nd.json) the novel text split by paragraph as a JSON file, then upload it by using the [Data Visualizer](../../../manage-data/ingest/tools/upload-data-files.md). Give the new index the name `les-miserables` when uploading the file.

Now create an ingest pipeline either in the [Stack management UI](ml-nlp-inference.md#ml-nlp-inference-processor) or by using the API:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ In this step, you load the data that you later use in an ingest pipeline to get

The data set `msmarco-passagetest2019-top1000` is a subset of the MS MARCO Passage Ranking data set used in the testing stage of the 2019 TREC Deep Learning Track. It contains 200 queries and for each query a list of relevant text passages extracted by a simple information retrieval (IR) system. From that data set, all unique passages with their IDs have been extracted and put into a [tsv file](https://github.com/elastic/stack-docs/blob/8.5/docs/en/stack/ml/nlp/data/msmarco-passagetest2019-unique.tsv), totaling 182469 passages. In the following, this file is used as the example data set.

Upload the file by using the [Data Visualizer](../../../manage-data/ingest.md#upload-data-kibana). Name the first column `id` and the second one `text`. The index name is `collection`. After the upload is done, you can see an index named `collection` with 182469 documents.
Upload the file by using the [Data Visualizer](../../../manage-data/ingest/tools/upload-data-files.md). Name the first column `id` and the second one `text`. The index name is `collection`. After the upload is done, you can see an index named `collection` with 182469 documents.

:::{image} ../../../images/machine-learning-ml-nlp-text-emb-data.png
:alt: Importing the data
Expand Down
46 changes: 29 additions & 17 deletions manage-data/ingest.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,29 +10,41 @@ mapped_urls:

# Ingestion

% What needs to be done: Finish draft
Bring your data! Whether you call it *adding*, *indexing*, or *ingesting* data, you have to get the data into {{es}} before you can search it, visualize it, and use it for insights.

% GitHub issue: docs-projects#326
Our ingest tools are flexible, and support a wide range of scenarios. We can help you with everything from popular and straightforward use cases, all the way to advanced use cases that require additional processing in order to modify or reshape your data before it goes to {{es}}.

% Scope notes: Brief introduction on use cases Importance of data ingestion theory / process how to frame these products as living independently from ES? Link to reference architectures
You can ingest:

% Use migrated content from existing pages that map to this page:
* **General content** (data without timestamps), such as HTML pages, catalogs, and files
* **Time series (timestamped) data**, such as logs, metrics, and traces for Elastic Security, Observability, Search solutions, or for your own custom solutions

% - [ ] ./raw-migrated-files/cloud/cloud/ec-cloud-ingest-data.md
% Notes: Use draft Overview from Karen's PR
% - [ ] ./raw-migrated-files/kibana/kibana/connect-to-elasticsearch.md
% Notes: Other existing pages might be used in the "Plan" section
% - [ ] ./raw-migrated-files/docs-content/serverless/elasticsearch-ingest-your-data.md
% - [ ] https://www.elastic.co/customer-success/data-ingestion
% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/es-ingestion-overview.md
% - [ ] ./raw-migrated-files/ingest-docs/ingest-overview/ingest-intro.md

% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc):
## Ingesting general content [ingest-general]

$$$upload-data-kibana$$$
Elastic offer tools designed to ingest specific types of general content. The content type determines the best ingest option.

$$$_add_sample_data$$$
* To index **documents** directly into {{es}}, use the {{es}} [document APIs](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html).
* To send **application data** directly to {{es}}, use an [{{es}} language client](https://www.elastic.co/guide/en/elasticsearch/client/index.html).
* To index **web page content**, use the Elastic [web crawler](https://www.elastic.co/web-crawler).
* To sync **data from third-party sources**, use [connectors](https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors.html). A connector syncs content from an original data source to an {{es}} index. Using connectors you can create *searchable*, read-only replicas of your data sources.
* To index **single files** for testing in a non-production environment, use the {{kib}} [file uploader](ingest/tools/upload-data-files.md).

$$$ec-ingest-methods$$$
If you would like to try things out before you add your own data, try using our [sample data](ingest/sample-data.md).

$$$ec-data-ingest-pipeline$$$

## Ingesting time series data [ingest-time-series]

::::{admonition} What’s the best approach for ingesting time series data?
The best approach for ingesting data is the *simplest option* that *meets your needs* and *satisfies your use case*.

In most cases, the *simplest option* for ingesting time series data is using {{agent}} paired with an Elastic integration.

* Install [Elastic Agent](https://www.elastic.co/guide/en/fleet/current) on the computer(s) from which you want to collect data.
* Add the [Elastic integration](https://docs.elastic.co/en/integrations) for the data source to your deployment.

Integrations are available for many popular platforms and services, and are a good place to start for ingesting data into Elastic solutions—​Observability, Security, and Search—​or your own search application.

Check out the [Integration quick reference](https://docs.elastic.co/en/integrations/all_integrations) to search for available integrations. If you don’t find an integration for your data source or if you need additional processing to extend the integration, we still have you covered. Refer to [Transform and enrich data](ingest/transform-enrich.md) to learn more.

::::
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ You can host {{es}} on your own hardware or send your data to {{es}} on {{ecloud

**Decision tree**

[Data ingestion pipeline with decision tree](https://www.elastic.co/guide/en/cloud/current/ec-cloud-ingest-data.html#ec-data-ingest-pipeline)
[Data ingestion](../../ingest.md)

| **Ingest architecture** | **Use when** |
| --- | --- |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -567,5 +567,5 @@ You can add titles to the visualizations, resize and position them as you like,

2. As your final step, remember to stop Filebeat, the Node.js web server, and the client. Enter *CTRL + C* in the terminal window for each application to stop them.

You now know how to monitor log files from a Node.js web application, deliver the log event data securely into an {{ech}} or {{ece}} deployment, and then visualize the results in Kibana in real time. Consult the [Filebeat documentation](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html) to learn more about the ingestion and processing options available for your data. You can also explore our [documentation](../../../manage-data/ingest.md#ec-ingest-methods) to learn all about ingesting data.
You now know how to monitor log files from a Node.js web application, deliver the log event data securely into an {{ech}} or {{ece}} deployment, and then visualize the results in Kibana in real time. Consult the [Filebeat documentation](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html) to learn more about the ingestion and processing options available for your data. You can also explore our [documentation](../../../manage-data/ingest.md) to learn all about ingesting data.

Original file line number Diff line number Diff line change
Expand Up @@ -446,5 +446,5 @@ You can add titles to the visualizations, resize and position them as you like,

2. As your final step, remember to stop Filebeat and the Python script. Enter *CTRL + C* in both your Filebeat terminal and in your `elvis.py` terminal.

You now know how to monitor log files from a Python application, deliver the log event data securely into an {{ech}} or {{ece}} deployment, and then visualize the results in Kibana in real time. Consult the [Filebeat documentation](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html) to learn more about the ingestion and processing options available for your data. You can also explore our [documentation](../../../manage-data/ingest.md#ec-ingest-methods) to learn all about all about ingesting data.
You now know how to monitor log files from a Python application, deliver the log event data securely into an {{ech}} or {{ece}} deployment, and then visualize the results in Kibana in real time. Consult the [Filebeat documentation](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html) to learn more about the ingestion and processing options available for your data. You can also explore our [documentation](../../../manage-data/ingest.md) to learn all about all about ingesting data.

2 changes: 1 addition & 1 deletion manage-data/ingest/ingesting-timeseries-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ mapped_pages:
- https://www.elastic.co/guide/en/ingest-overview/current/ingest-tools.html
---

# Ingesting timeseries data [ingest-tools]
# Ingesting time series data [ingest-tools]

Elastic and others offer tools to help you get your data from the original data source into {{es}}. Some tools are designed for particular data sources, and others are multi-purpose.

Expand Down
9 changes: 7 additions & 2 deletions manage-data/ingest/tools/upload-data-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,16 @@ mapped_urls:
- https://www.elastic.co/guide/en/kibana/current/connect-to-elasticsearch.html#upload-data-kibana
---

# Upload data files
# Upload data files [upload-data-kibana]

% What needs to be done: Align serverless/stateful

% Use migrated content from existing pages that map to this page:

% - [ ] ./raw-migrated-files/docs-content/serverless/elasticsearch-ingest-data-file-upload.md
% - [ ] ./raw-migrated-files/kibana/kibana/connect-to-elasticsearch.md
% - [ ] ./raw-migrated-files/kibana/kibana/connect-to-elasticsearch.md



% Note from David: I've removed the ID $$$upload-data-kibana$$$ from manage-data/ingest.md as those links should instead point to this page. So, please ensure that the following ID is included on this page. I've added it beside the title.

Original file line number Diff line number Diff line change
Expand Up @@ -517,5 +517,5 @@ You can add titles to the visualizations, resize and position them as you like,

2. As your final step, remember to stop Filebeat, the Node.js web server, and the client. Enter *CTRL + C* in the terminal window for each application to stop them.

You now know how to monitor log files from a Node.js web application, deliver the log event data securely into an Elasticsearch Service deployment, and then visualize the results in Kibana in real time. Consult the [Filebeat documentation](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html) to learn more about the ingestion and processing options available for your data. You can also explore our [documentation](../../../manage-data/ingest.md#ec-ingest-methods) to learn all about working in Elasticsearch Service.
You now know how to monitor log files from a Node.js web application, deliver the log event data securely into an Elasticsearch Service deployment, and then visualize the results in Kibana in real time. Consult the [Filebeat documentation](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html) to learn more about the ingestion and processing options available for your data. You can also explore our [documentation](../../../manage-data/ingest.md) to learn all about working in Elasticsearch Service.

Original file line number Diff line number Diff line change
Expand Up @@ -408,5 +408,5 @@ You can add titles to the visualizations, resize and position them as you like,

2. As your final step, remember to stop Filebeat and the Python script. Enter *CTRL + C* in both your Filebeat terminal and in your `elvis.py` terminal.

You now know how to monitor log files from a Python application, deliver the log event data securely into an Elasticsearch Service deployment, and then visualize the results in Kibana in real time. Consult the [Filebeat documentation](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html) to learn more about the ingestion and processing options available for your data. You can also explore our [documentation](../../../manage-data/ingest.md#ec-ingest-methods) to learn all about working in Elasticsearch Service.
You now know how to monitor log files from a Python application, deliver the log event data securely into an Elasticsearch Service deployment, and then visualize the results in Kibana in real time. Consult the [Filebeat documentation](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html) to learn more about the ingestion and processing options available for your data. You can also explore our [documentation](../../../manage-data/ingest.md) to learn all about working in Elasticsearch Service.

Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ You can subscribe to Elastic Cloud at any time during your trial. Billing starts

## How do I get started with my trial? [ec_how_do_i_get_started_with_my_trial]

Start by checking out some common approaches for [moving data into Elastic Cloud](../../../manage-data/ingest.md#ec-ingest-methods).
Start by checking out some common approaches for [moving data into Elastic Cloud](../../../manage-data/ingest.md).


## How do I sign up through a marketplace? [ec_how_do_i_sign_up_through_a_marketplace]
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ You can subscribe to Elastic Cloud at any time during your trial. [Billing](../.

## Get started with your trial [general-sign-up-trial-how-do-i-get-started-with-my-trial]

Start by checking out some common approaches for [moving data into Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-cloud-ingest-data.html#ec-ingest-methods).
Start by checking out some common approaches for [moving data into Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-cloud-ingest-data.html).


## Maintain access to your trial projects and data [general-sign-up-trial-what-happens-at-the-end-of-the-trial]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -824,7 +824,7 @@ In this step, you load the data that you later use in the {{infer}} ingest pipel

Use the `msmarco-passagetest2019-top1000` data set, which is a subset of the MS MARCO Passage Ranking data set. It consists of 200 queries, each accompanied by a list of relevant text passages. All unique passages, along with their IDs, have been extracted from that data set and compiled into a [tsv file](https://github.com/elastic/stack-docs/blob/main/docs/en/stack/ml/nlp/data/msmarco-passagetest2019-unique.tsv).

Download the file and upload it to your cluster using the [Data Visualizer](../../../manage-data/ingest.md#upload-data-kibana) in the {{ml-app}} UI. After your data is analyzed, click **Override settings**. Under **Edit field names***, assign `id` to the first column and `content` to the second. Click ***Apply***, then ***Import**. Name the index `test-data`, and click **Import**. After the upload is complete, you will see an index named `test-data` with 182,469 documents.
Download the file and upload it to your cluster using the [Data Visualizer](../../../manage-data/ingest/tools/upload-data-files.md) in the {{ml-app}} UI. After your data is analyzed, click **Override settings**. Under **Edit field names***, assign `id` to the first column and `content` to the second. Click ***Apply***, then ***Import**. Name the index `test-data`, and click **Import**. After the upload is complete, you will see an index named `test-data` with 182,469 documents.


## Ingest the data through the {{infer}} ingest pipeline [reindexing-data-infer]
Expand Down
Loading
Loading