Skip to content

Commit 2649c09

Browse files
Merge branch 'main' into rip-serverless-files-pt3
2 parents e52b663 + c913055 commit 2649c09

File tree

16 files changed

+107
-40
lines changed

16 files changed

+107
-40
lines changed

deploy-manage/deploy/kibana-reporting-configuration.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ mapped_urls:
2121

2222
$$$reporting-chromium-sandbox$$$
2323

24+
$$$grant-user-access$$$
25+
2426
⚠️ **This page is a work in progress.** ⚠️
2527

2628
The documentation team is working to combine content pulled from the following pages:

explore-analyze/machine-learning/nlp/ml-nlp-ner-example.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ Using the example text "Elastic is headquartered in Mountain View, California.",
113113

114114
## Add the NER model to an {{infer}} ingest pipeline [ex-ner-ingest]
115115

116-
You can perform bulk {{infer}} on documents as they are ingested by using an [{{infer}} processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-processor.html) in your ingest pipeline. The novel *Les Misérables* by Victor Hugo is used as an example for {{infer}} in the following example. [Download](https://github.com/elastic/stack-docs/blob/8.5/docs/en/stack/ml/nlp/data/les-miserables-nd.json) the novel text split by paragraph as a JSON file, then upload it by using the [Data Visualizer](../../../manage-data/ingest/tools/upload-data-files.md). Give the new index the name `les-miserables` when uploading the file.
116+
You can perform bulk {{infer}} on documents as they are ingested by using an [{{infer}} processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-processor.html) in your ingest pipeline. The novel *Les Misérables* by Victor Hugo is used as an example for {{infer}} in the following example. [Download](https://github.com/elastic/stack-docs/blob/8.5/docs/en/stack/ml/nlp/data/les-miserables-nd.json) the novel text split by paragraph as a JSON file, then upload it by using the [Data Visualizer](../../../manage-data/ingest/upload-data-files.md). Give the new index the name `les-miserables` when uploading the file.
117117

118118
Now create an ingest pipeline either in the [Stack management UI](ml-nlp-inference.md#ml-nlp-inference-processor) or by using the API:
119119

explore-analyze/machine-learning/nlp/ml-nlp-text-emb-vector-search-example.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ In this step, you load the data that you later use in an ingest pipeline to get
103103

104104
The data set `msmarco-passagetest2019-top1000` is a subset of the MS MARCO Passage Ranking data set used in the testing stage of the 2019 TREC Deep Learning Track. It contains 200 queries and for each query a list of relevant text passages extracted by a simple information retrieval (IR) system. From that data set, all unique passages with their IDs have been extracted and put into a [tsv file](https://github.com/elastic/stack-docs/blob/8.5/docs/en/stack/ml/nlp/data/msmarco-passagetest2019-unique.tsv), totaling 182469 passages. In the following, this file is used as the example data set.
105105

106-
Upload the file by using the [Data Visualizer](../../../manage-data/ingest/tools/upload-data-files.md). Name the first column `id` and the second one `text`. The index name is `collection`. After the upload is done, you can see an index named `collection` with 182469 documents.
106+
Upload the file by using the [Data Visualizer](../../../manage-data/ingest/upload-data-files.md). Name the first column `id` and the second one `text`. The index name is `collection`. After the upload is done, you can see an index named `collection` with 182469 documents.
107107

108108
:::{image} ../../../images/machine-learning-ml-nlp-text-emb-data.png
109109
:alt: Importing the data

manage-data/ingest.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Elastic offer tools designed to ingest specific types of general content. The co
2828
* To send **application data** directly to {{es}}, use an [{{es}} language client](https://www.elastic.co/guide/en/elasticsearch/client/index.html).
2929
* To index **web page content**, use the Elastic [web crawler](https://www.elastic.co/web-crawler).
3030
* To sync **data from third-party sources**, use [connectors](https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors.html). A connector syncs content from an original data source to an {{es}} index. Using connectors you can create *searchable*, read-only replicas of your data sources.
31-
* To index **single files** for testing in a non-production environment, use the {{kib}} [file uploader](ingest/tools/upload-data-files.md).
31+
* To index **single files** for testing in a non-production environment, use the {{kib}} [file uploader](ingest/upload-data-files.md).
3232

3333
If you would like to try things out before you add your own data, try using our [sample data](ingest/sample-data.md).
3434

manage-data/ingest/tools.md

Lines changed: 28 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,37 @@ mapped_urls:
1818

1919
% Use migrated content from existing pages that map to this page:
2020

21-
% - [ ] ./raw-migrated-files/cloud/cloud/ec-cloud-ingest-data.md
21+
% - [x] ./raw-migrated-files/cloud/cloud/ec-cloud-ingest-data.md
2222
% Notes: These are resources to pull from, but this new "Ingest tools overiew" page will not be a replacement for any of these old AsciiDoc pages. File upload: https://www.elastic.co/guide/en/kibana/current/connect-to-elasticsearch.html#upload-data-kibana https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-file-upload.html API: https://www.elastic.co/guide/en/kibana/current/connect-to-elasticsearch.html#_add_data_with_programming_languages https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-through-api.html OpenTelemetry: https://github.com/elastic/opentelemetry Fleet and Agent: https://www.elastic.co/guide/en/fleet/current/fleet-overview.html https://www.elastic.co/guide/en/serverless/current/fleet-and-elastic-agent.html Logstash: https://www.elastic.co/guide/en/logstash/current/introduction.html https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-through-logstash.html https://www.elastic.co/guide/en/serverless/current/logstash-pipelines.html Beats: https://www.elastic.co/guide/en/beats/libbeat/current/beats-reference.html https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-through-beats.html APM: /solutions/observability/apps/application-performance-monitoring-apm.md Application logging: https://www.elastic.co/guide/en/observability/current/application-logs.html ECS logging: https://www.elastic.co/guide/en/observability/current/logs-ecs-application.html Elastic serverless forwarder for AWS: https://www.elastic.co/guide/en/esf/current/aws-elastic-serverless-forwarder.html Integrations: https://www.elastic.co/guide/en/integrations/current/introduction.html Search connectors: https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors.html https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-through-integrations-connector-client.html Web crawler: https://github.com/elastic/crawler/tree/main/docs
23-
% - [ ] ./raw-migrated-files/ingest-docs/fleet/beats-agent-comparison.md
24-
% - [ ] ./raw-migrated-files/kibana/kibana/connect-to-elasticsearch.md
25-
% - [ ] https://www.elastic.co/customer-success/data-ingestion
26-
% - [ ] https://github.com/elastic/ingest-docs/pull/1373
23+
% - [This comparison page is being moved to the reference section, so I'm linking to that from the current page - Wajiha] ./raw-migrated-files/ingest-docs/fleet/beats-agent-comparison.md
24+
% - [x] ./raw-migrated-files/kibana/kibana/connect-to-elasticsearch.md
25+
% - [x] https://www.elastic.co/customer-success/data-ingestion
26+
% - [x] https://github.com/elastic/ingest-docs/pull/1373
2727

28-
% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc):
28+
% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc):
29+
% These IDs are from content that I'm not including on this current page. I've resolved them by changing the internal links to anchor links where needed. - Wajiha
2930

3031
$$$supported-outputs-beats-and-agent$$$
3132

3233
$$$additional-capabilities-beats-and-agent$$$
34+
35+
Depending on the type of data you want to ingest, you have a number of methods and tools available for use in your ingestion process. The table below provides more information about the available tools. Refer to our [Ingestion](/manage-data/ingest.md) overview for some guidelines to help you select the optimal tool for your use case.
36+
37+
<br>
38+
39+
| Tools | Usage | Links to more information |
40+
| ------- | --------------- | ------------------------- |
41+
| Integrations | Ingest data using a variety of Elastic integrations. | [Elastic Integrations](https://www.elastic.co/guide/en/integrations/current/index.html) |
42+
| File upload | Upload data from a file and inspect it before importing it into {{es}}. | [Upload data files](/manage-data/ingest/upload-data-files.md) |
43+
| APIs | Ingest data through code by using the APIs of one of the language clients or the {{es}} HTTP APIs. | [Document APIs](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html) |
44+
| OpenTelemetry | Collect and send your telemetry data to Elastic Observability | [Elastic Distributions of OpenTelemetry](https://github.com/elastic/opentelemetry?tab=readme-ov-file#elastic-distributions-of-opentelemetry) |
45+
| Fleet and Elastic Agent | Add monitoring for logs, metrics, and other types of data to a host using Elastic Agent, and centrally manage it using Fleet. | [Fleet and {{agent}} overview](https://www.elastic.co/guide/en/fleet/current/fleet-overview.html) <br> [{{fleet}} and {{agent}} restrictions (Serverless)](https://www.elastic.co/guide/en/fleet/current/fleet-agent-serverless-restrictions.html) <br> [{{beats}} and {{agent}} capabilities](https://www.elastic.co/guide/en/fleet/current/beats-agent-comparison.html)||
46+
| {{elastic-defend}} | {{elastic-defend}} provides organizations with prevention, detection, and response capabilities with deep visibility for EPP, EDR, SIEM, and Security Analytics use cases across Windows, macOS, and Linux operating systems running on both traditional endpoints and public cloud environments. | [Configure endpoint protection with {{elastic-defend}}](/solutions/security/configure-elastic-defend.md) |
47+
| {{ls}} | Dynamically unify data from a wide variety of data sources and normalize it into destinations of your choice with {{ls}}. | [Logstash (Serverless)](https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-through-logstash.html) <br> [Logstash pipelines](/manage-data/ingest/transform-enrich/logstash-pipelines.md) |
48+
| {{beats}} | Use {{beats}} data shippers to send operational data to Elasticsearch directly or through Logstash. | [{{beats}} (Serverless)](https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-through-beats.html) <br> [What are {{beats}}?](https://www.elastic.co/guide/en/beats/libbeat/current/beats-reference.html) <br> [{{beats}} and {{agent}} capabilities](https://www.elastic.co/guide/en/fleet/current/beats-agent-comparison.html)|
49+
| APM | Collect detailed performance information on response time for incoming requests, database queries, calls to caches, external HTTP requests, and more. | [Application performance monitoring (APM)](/solutions/observability/apps/application-performance-monitoring-apm.md) |
50+
| Application logs | Ingest application logs using Filebeat, {{agent}}, or the APM agent, or reformat application logs into Elastic Common Schema (ECS) logs and then ingest them using Filebeat or {{agent}}. | [Stream application logs](/solutions/observability/logs/stream-application-logs.md) <br> [ECS formatted application logs](/solutions/observability/logs/ecs-formatted-application-logs.md) |
51+
| Elastic Serverless forwarder for AWS | Ship logs from your AWS environment to cloud-hosted, self-managed Elastic environments, or {{ls}}. | [Elastic Serverless Forwarder](https://www.elastic.co/guide/en/esf/current/aws-elastic-serverless-forwarder.html) |
52+
| Connectors | Use connectors to extract data from an original data source and sync it to an {{es}} index. | [Ingest content with Elastic connectors
53+
](https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors.html) <br> [Connector clients](https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-through-integrations-connector-client.html) |
54+
| Web crawler | Discover, extract, and index searchable content from websites and knowledge bases using the web crawler. | [Elastic Open Web Crawler](https://github.com/elastic/crawler#readme) |

manage-data/ingest/tools/upload-data-files.md

Lines changed: 0 additions & 19 deletions
This file was deleted.
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
mapped_urls:
3+
- https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-data-file-upload.html
4+
- https://www.elastic.co/guide/en/kibana/current/connect-to-elasticsearch.html#upload-data-kibana
5+
---
6+
7+
# Upload data files [upload-data-kibana]
8+
9+
% What needs to be done: Align serverless/stateful
10+
11+
% Use migrated content from existing pages that map to this page:
12+
13+
% - [x] ./raw-migrated-files/docs-content/serverless/elasticsearch-ingest-data-file-upload.md
14+
% - [x] ./raw-migrated-files/kibana/kibana/connect-to-elasticsearch.md
15+
16+
% Note from David: I've removed the ID $$$upload-data-kibana$$$ from manage-data/ingest.md as those links should instead point to this page. So, please ensure that the following ID is included on this page. I've added it beside the title.
17+
18+
You can upload files, view their fields and metrics, and optionally import them to {{es}} with the Data Visualizer.
19+
20+
To use the Data Visualizer, click **Upload a file** on the {{es}} **Getting Started** page or navigate to the **Integrations** view and search for **Upload a file**. Clicking **Upload a file** opens the Data Visualizer UI.
21+
22+
:::{image} /images/serverless-file-uploader-UI.png
23+
:alt: File upload UI
24+
:class: screenshot
25+
:::
26+
27+
Drag a file into the upload area or click **Select or drag and drop a file** to choose a file from your computer.
28+
29+
You can upload different file formats for analysis with the Data Visualizer:
30+
31+
File formats supported up to 500 MB:
32+
33+
* CSV
34+
* TSV
35+
* NDJSON
36+
* Log files
37+
38+
File formats supported up to 60 MB:
39+
40+
* PDF
41+
* Microsoft Office files (Word, Excel, PowerPoint)
42+
* Plain Text (TXT)
43+
* Rich Text (RTF)
44+
* Open Document Format (ODF)
45+
46+
The Data Visualizer displays the first 1000 rows of the file. You can inspect the data and make any necessary changes before importing it. Click **Import** continue the process.
47+
48+
This process will create an index and import the data into {{es}}. Once your data is in {{es}}, you can start exploring it, see [Explore and analyze](/explore-analyze/index.md) for more information.
49+
50+
::::{important}
51+
The upload feature is not intended for use as part of a repeated production process, but rather for the initial exploration of your data.
52+
53+
::::
54+
55+
## Required privileges
56+
57+
The {{stack-security-features}} provide roles and privileges that control which users can upload files. To upload a file in {{kib}} and import it into an {{es}} index, you’ll need:
58+
59+
* `manage_pipeline` or `manage_ingest_pipelines` cluster privilege
60+
* `create`, `create_index`, `manage`, and `read` index privileges for the index
61+
* `all` {{kib}} privileges for **Discover** and **Data Views Management**
62+
63+
You can manage your roles, privileges, and spaces in **{{stack-manage-app}}**.

manage-data/toc.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ toc:
9191
- file: ingest/ingest-reference-architectures/agent-es-airgapped.md
9292
- file: ingest/ingest-reference-architectures/agent-ls-airgapped.md
9393
- file: ingest/sample-data.md
94+
- file: ingest/upload-data-files.md
9495
- file: ingest/transform-enrich.md
9596
children:
9697
- file: ingest/transform-enrich/ingest-pipelines-serverless.md
@@ -106,8 +107,6 @@ toc:
106107
- file: ingest/transform-enrich/example-enrich-data-by-matching-value-to-range.md
107108
- file: ingest/transform-enrich/index-mapping-text-analysis.md
108109
- file: ingest/tools.md
109-
children:
110-
- file: ingest/tools/upload-data-files.md
111110
- file: lifecycle.md
112111
children:
113112
- file: lifecycle/data-tiers.md

raw-migrated-files/elasticsearch/elasticsearch-reference/semantic-search-inference.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -824,7 +824,7 @@ In this step, you load the data that you later use in the {{infer}} ingest pipel
824824

825825
Use the `msmarco-passagetest2019-top1000` data set, which is a subset of the MS MARCO Passage Ranking data set. It consists of 200 queries, each accompanied by a list of relevant text passages. All unique passages, along with their IDs, have been extracted from that data set and compiled into a [tsv file](https://github.com/elastic/stack-docs/blob/main/docs/en/stack/ml/nlp/data/msmarco-passagetest2019-unique.tsv).
826826

827-
Download the file and upload it to your cluster using the [Data Visualizer](../../../manage-data/ingest/tools/upload-data-files.md) in the {{ml-app}} UI. After your data is analyzed, click **Override settings**. Under **Edit field names***, assign `id` to the first column and `content` to the second. Click ***Apply***, then ***Import**. Name the index `test-data`, and click **Import**. After the upload is complete, you will see an index named `test-data` with 182,469 documents.
827+
Download the file and upload it to your cluster using the [Data Visualizer](../../../manage-data/ingest/upload-data-files.md) in the {{ml-app}} UI. After your data is analyzed, click **Override settings**. Under **Edit field names***, assign `id` to the first column and `content` to the second. Click ***Apply***, then ***Import**. Name the index `test-data`, and click **Import**. After the upload is complete, you will see an index named `test-data` with 182,469 documents.
828828

829829

830830
## Ingest the data through the {{infer}} ingest pipeline [reindexing-data-infer]

0 commit comments

Comments
 (0)