You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: explore-analyze/machine-learning/nlp/ml-nlp-ner-example.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -113,7 +113,7 @@ Using the example text "Elastic is headquartered in Mountain View, California.",
113
113
114
114
## Add the NER model to an {{infer}} ingest pipeline [ex-ner-ingest]
115
115
116
-
You can perform bulk {{infer}} on documents as they are ingested by using an [{{infer}} processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-processor.html) in your ingest pipeline. The novel *Les Misérables* by Victor Hugo is used as an example for {{infer}} in the following example. [Download](https://github.com/elastic/stack-docs/blob/8.5/docs/en/stack/ml/nlp/data/les-miserables-nd.json) the novel text split by paragraph as a JSON file, then upload it by using the [Data Visualizer](../../../manage-data/ingest/tools/upload-data-files.md). Give the new index the name `les-miserables` when uploading the file.
116
+
You can perform bulk {{infer}} on documents as they are ingested by using an [{{infer}} processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-processor.html) in your ingest pipeline. The novel *Les Misérables* by Victor Hugo is used as an example for {{infer}} in the following example. [Download](https://github.com/elastic/stack-docs/blob/8.5/docs/en/stack/ml/nlp/data/les-miserables-nd.json) the novel text split by paragraph as a JSON file, then upload it by using the [Data Visualizer](../../../manage-data/ingest/upload-data-files.md). Give the new index the name `les-miserables` when uploading the file.
117
117
118
118
Now create an ingest pipeline either in the [Stack management UI](ml-nlp-inference.md#ml-nlp-inference-processor) or by using the API:
Copy file name to clipboardExpand all lines: explore-analyze/machine-learning/nlp/ml-nlp-text-emb-vector-search-example.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -103,7 +103,7 @@ In this step, you load the data that you later use in an ingest pipeline to get
103
103
104
104
The data set `msmarco-passagetest2019-top1000` is a subset of the MS MARCO Passage Ranking data set used in the testing stage of the 2019 TREC Deep Learning Track. It contains 200 queries and for each query a list of relevant text passages extracted by a simple information retrieval (IR) system. From that data set, all unique passages with their IDs have been extracted and put into a [tsv file](https://github.com/elastic/stack-docs/blob/8.5/docs/en/stack/ml/nlp/data/msmarco-passagetest2019-unique.tsv), totaling 182469 passages. In the following, this file is used as the example data set.
105
105
106
-
Upload the file by using the [Data Visualizer](../../../manage-data/ingest/tools/upload-data-files.md). Name the first column `id` and the second one `text`. The index name is `collection`. After the upload is done, you can see an index named `collection` with 182469 documents.
106
+
Upload the file by using the [Data Visualizer](../../../manage-data/ingest/upload-data-files.md). Name the first column `id` and the second one `text`. The index name is `collection`. After the upload is done, you can see an index named `collection` with 182469 documents.
Copy file name to clipboardExpand all lines: manage-data/ingest.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ Elastic offer tools designed to ingest specific types of general content. The co
28
28
* To send **application data** directly to {{es}}, use an [{{es}} language client](https://www.elastic.co/guide/en/elasticsearch/client/index.html).
29
29
* To index **web page content**, use the Elastic [web crawler](https://www.elastic.co/web-crawler).
30
30
* To sync **data from third-party sources**, use [connectors](https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors.html). A connector syncs content from an original data source to an {{es}} index. Using connectors you can create *searchable*, read-only replicas of your data sources.
31
-
* To index **single files** for testing in a non-production environment, use the {{kib}} [file uploader](ingest/tools/upload-data-files.md).
31
+
* To index **single files** for testing in a non-production environment, use the {{kib}} [file uploader](ingest/upload-data-files.md).
32
32
33
33
If you would like to try things out before you add your own data, try using our [sample data](ingest/sample-data.md).
% Notes: These are resources to pull from, but this new "Ingest tools overiew" page will not be a replacement for any of these old AsciiDoc pages. File upload: https://www.elastic.co/guide/en/kibana/current/connect-to-elasticsearch.html#upload-data-kibana https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-file-upload.html API: https://www.elastic.co/guide/en/kibana/current/connect-to-elasticsearch.html#_add_data_with_programming_languages https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-through-api.html OpenTelemetry: https://github.com/elastic/opentelemetry Fleet and Agent: https://www.elastic.co/guide/en/fleet/current/fleet-overview.html https://www.elastic.co/guide/en/serverless/current/fleet-and-elastic-agent.html Logstash: https://www.elastic.co/guide/en/logstash/current/introduction.html https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-through-logstash.html https://www.elastic.co/guide/en/serverless/current/logstash-pipelines.html Beats: https://www.elastic.co/guide/en/beats/libbeat/current/beats-reference.html https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-through-beats.html APM: /solutions/observability/apps/application-performance-monitoring-apm.md Application logging: https://www.elastic.co/guide/en/observability/current/application-logs.html ECS logging: https://www.elastic.co/guide/en/observability/current/logs-ecs-application.html Elastic serverless forwarder for AWS: https://www.elastic.co/guide/en/esf/current/aws-elastic-serverless-forwarder.html Integrations: https://www.elastic.co/guide/en/integrations/current/introduction.html Search connectors: https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors.html https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-through-integrations-connector-client.html Web crawler: https://github.com/elastic/crawler/tree/main/docs
% - [This comparison page is being moved to the reference section, so I'm linking to that from the current page - Wajiha] ./raw-migrated-files/ingest-docs/fleet/beats-agent-comparison.md
% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc):
28
+
% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc):
29
+
% These IDs are from content that I'm not including on this current page. I've resolved them by changing the internal links to anchor links where needed. - Wajiha
29
30
30
31
$$$supported-outputs-beats-and-agent$$$
31
32
32
33
$$$additional-capabilities-beats-and-agent$$$
34
+
35
+
Depending on the type of data you want to ingest, you have a number of methods and tools available for use in your ingestion process. The table below provides more information about the available tools. Refer to our [Ingestion](/manage-data/ingest.md) overview for some guidelines to help you select the optimal tool for your use case.
| Integrations | Ingest data using a variety of Elastic integrations. |[Elastic Integrations](https://www.elastic.co/guide/en/integrations/current/index.html)|
42
+
| File upload | Upload data from a file and inspect it before importing it into {{es}}. |[Upload data files](/manage-data/ingest/upload-data-files.md)|
43
+
| APIs | Ingest data through code by using the APIs of one of the language clients or the {{es}} HTTP APIs. |[Document APIs](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html)|
44
+
| OpenTelemetry | Collect and send your telemetry data to Elastic Observability |[Elastic Distributions of OpenTelemetry](https://github.com/elastic/opentelemetry?tab=readme-ov-file#elastic-distributions-of-opentelemetry)|
45
+
| Fleet and Elastic Agent | Add monitoring for logs, metrics, and other types of data to a host using Elastic Agent, and centrally manage it using Fleet. |[Fleet and {{agent}} overview](https://www.elastic.co/guide/en/fleet/current/fleet-overview.html) <br> [{{fleet}} and {{agent}} restrictions (Serverless)](https://www.elastic.co/guide/en/fleet/current/fleet-agent-serverless-restrictions.html) <br> [{{beats}} and {{agent}} capabilities](https://www.elastic.co/guide/en/fleet/current/beats-agent-comparison.html)||
46
+
| {{elastic-defend}} | {{elastic-defend}} provides organizations with prevention, detection, and response capabilities with deep visibility for EPP, EDR, SIEM, and Security Analytics use cases across Windows, macOS, and Linux operating systems running on both traditional endpoints and public cloud environments. |[Configure endpoint protection with {{elastic-defend}}](/solutions/security/configure-elastic-defend.md)|
47
+
| {{ls}} | Dynamically unify data from a wide variety of data sources and normalize it into destinations of your choice with {{ls}}. |[Logstash (Serverless)](https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-through-logstash.html) <br> [Logstash pipelines](/manage-data/ingest/transform-enrich/logstash-pipelines.md)|
48
+
| {{beats}} | Use {{beats}} data shippers to send operational data to Elasticsearch directly or through Logstash. |[{{beats}} (Serverless)](https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-through-beats.html) <br> [What are {{beats}}?](https://www.elastic.co/guide/en/beats/libbeat/current/beats-reference.html) <br> [{{beats}} and {{agent}} capabilities](https://www.elastic.co/guide/en/fleet/current/beats-agent-comparison.html)|
49
+
| APM | Collect detailed performance information on response time for incoming requests, database queries, calls to caches, external HTTP requests, and more. |[Application performance monitoring (APM)](/solutions/observability/apps/application-performance-monitoring-apm.md)|
50
+
| Application logs | Ingest application logs using Filebeat, {{agent}}, or the APM agent, or reformat application logs into Elastic Common Schema (ECS) logs and then ingest them using Filebeat or {{agent}}. |[Stream application logs](/solutions/observability/logs/stream-application-logs.md) <br> [ECS formatted application logs](/solutions/observability/logs/ecs-formatted-application-logs.md)|
51
+
| Elastic Serverless forwarder for AWS | Ship logs from your AWS environment to cloud-hosted, self-managed Elastic environments, or {{ls}}. |[Elastic Serverless Forwarder](https://www.elastic.co/guide/en/esf/current/aws-elastic-serverless-forwarder.html)|
52
+
| Connectors | Use connectors to extract data from an original data source and sync it to an {{es}} index. | [Ingest content with Elastic connectors
| Web crawler | Discover, extract, and index searchable content from websites and knowledge bases using the web crawler. |[Elastic Open Web Crawler](https://github.com/elastic/crawler#readme)|
% Note from David: I've removed the ID $$$upload-data-kibana$$$ from manage-data/ingest.md as those links should instead point to this page. So, please ensure that the following ID is included on this page. I've added it beside the title.
17
+
18
+
You can upload files, view their fields and metrics, and optionally import them to {{es}} with the Data Visualizer.
19
+
20
+
To use the Data Visualizer, click **Upload a file** on the {{es}} **Getting Started** page or navigate to the **Integrations** view and search for **Upload a file**. Clicking **Upload a file** opens the Data Visualizer UI.
Drag a file into the upload area or click **Select or drag and drop a file** to choose a file from your computer.
28
+
29
+
You can upload different file formats for analysis with the Data Visualizer:
30
+
31
+
File formats supported up to 500 MB:
32
+
33
+
* CSV
34
+
* TSV
35
+
* NDJSON
36
+
* Log files
37
+
38
+
File formats supported up to 60 MB:
39
+
40
+
* PDF
41
+
* Microsoft Office files (Word, Excel, PowerPoint)
42
+
* Plain Text (TXT)
43
+
* Rich Text (RTF)
44
+
* Open Document Format (ODF)
45
+
46
+
The Data Visualizer displays the first 1000 rows of the file. You can inspect the data and make any necessary changes before importing it. Click **Import** continue the process.
47
+
48
+
This process will create an index and import the data into {{es}}. Once your data is in {{es}}, you can start exploring it, see [Explore and analyze](/explore-analyze/index.md) for more information.
49
+
50
+
::::{important}
51
+
The upload feature is not intended for use as part of a repeated production process, but rather for the initial exploration of your data.
52
+
53
+
::::
54
+
55
+
## Required privileges
56
+
57
+
The {{stack-security-features}} provide roles and privileges that control which users can upload files. To upload a file in {{kib}} and import it into an {{es}} index, you’ll need:
58
+
59
+
*`manage_pipeline` or `manage_ingest_pipelines` cluster privilege
60
+
*`create`, `create_index`, `manage`, and `read` index privileges for the index
61
+
*`all` {{kib}} privileges for **Discover** and **Data Views Management**
62
+
63
+
You can manage your roles, privileges, and spaces in **{{stack-manage-app}}**.
Copy file name to clipboardExpand all lines: raw-migrated-files/elasticsearch/elasticsearch-reference/semantic-search-inference.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -824,7 +824,7 @@ In this step, you load the data that you later use in the {{infer}} ingest pipel
824
824
825
825
Use the `msmarco-passagetest2019-top1000` data set, which is a subset of the MS MARCO Passage Ranking data set. It consists of 200 queries, each accompanied by a list of relevant text passages. All unique passages, along with their IDs, have been extracted from that data set and compiled into a [tsv file](https://github.com/elastic/stack-docs/blob/main/docs/en/stack/ml/nlp/data/msmarco-passagetest2019-unique.tsv).
826
826
827
-
Download the file and upload it to your cluster using the [Data Visualizer](../../../manage-data/ingest/tools/upload-data-files.md) in the {{ml-app}} UI. After your data is analyzed, click **Override settings**. Under **Edit field names***, assign `id` to the first column and `content` to the second. Click ***Apply***, then ***Import**. Name the index `test-data`, and click **Import**. After the upload is complete, you will see an index named `test-data` with 182,469 documents.
827
+
Download the file and upload it to your cluster using the [Data Visualizer](../../../manage-data/ingest/upload-data-files.md) in the {{ml-app}} UI. After your data is analyzed, click **Override settings**. Under **Edit field names***, assign `id` to the first column and `content` to the second. Click ***Apply***, then ***Import**. Name the index `test-data`, and click **Import**. After the upload is complete, you will see an index named `test-data` with 182,469 documents.
828
828
829
829
830
830
## Ingest the data through the {{infer}} ingest pipeline [reindexing-data-infer]
0 commit comments