Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion manage-data/ingest/ingest-reference-architectures.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ You can host {{es}} on your own hardware or send your data to {{es}} on {{ecloud
| --- | --- |
| [*{{agent}} to Elasticsearch*](./ingest-reference-architectures/agent-to-es.md)<br><br>![Image showing {{agent}} collecting data and sending to {{es}}](/manage-data/images/ingest-ea-es.png "") | An [{{agent}} integration](https://docs.elastic.co/en/integrations) is available for your data source:<br><br>* Software components with [{{agent}} installed](./ingest-reference-architectures/agent-installed.md)<br>* Software components using [APIs for data collection](./ingest-reference-architectures/agent-apis.md)<br> |
| [*{{agent}} to {{ls}} to Elasticsearch*](./ingest-reference-architectures/agent-ls.md)<br><br>![Image showing {{agent}} to {{ls}} to {{es}}](/manage-data/images/ingest-ea-ls-es.png "") | You need additional capabilities offered by {{ls}}:<br><br>* [**enrichment**](./ingest-reference-architectures/ls-enrich.md) between {{agent}} and {{es}}<br>* [**persistent queue (PQ) buffering**](./ingest-reference-architectures/lspq.md) to accommodate network issues and downstream unavailability<br>* [**proxying**](./ingest-reference-architectures/ls-networkbridge.md) in cases where {{agent}}s have network restrictions for connecting outside of the {{agent}} network<br>* data needs to be [**routed to multiple**](./ingest-reference-architectures/ls-multi.md) {{es}} clusters and other destinations depending on the content<br> |
| [*{{agent}} to proxy to Elasticsearch*](./ingest-reference-architectures/agent-proxy.md)<br><br>![Image showing connections between {{agent}} and {{es}} using a proxy](/manage-data/images/ingest-ea-proxy-es.png "") | Agents have [network restrictions](./ingest-reference-architectures/agent-proxy.md) that prevent connecting outside of the {{agent}} network Note that [{{ls}} as proxy](./ingest-reference-architectures/ls-networkbridge.md) is one option.<br> |
| [*{{agent}} to proxy to Elasticsearch*](./ingest-reference-architectures/agent-proxy.md)<br><br>![Image showing connections between {{agent}} and {{es}} using a proxy](/manage-data/images/ingest-ea-proxy-es.png "") | Agents have [network restrictions](./ingest-reference-architectures/agent-proxy.md) that prevent connecting outside of the {{agent}} network. [{{ls}} as proxy](./ingest-reference-architectures/ls-networkbridge.md) is one option.<br> |
| [*{{agent}} to {{es}} with Kafka as middleware message queue*](./ingest-reference-architectures/agent-kafka-es.md)<br><br>![Image showing {{agent}} collecting data and using Kafka as a message queue enroute to {{es}}](/manage-data/images/ingest-ea-kafka.png "") | Kafka is your [middleware message queue](./ingest-reference-architectures/agent-kafka-es.md):<br><br>* [Kafka ES sink connector](./ingest-reference-architectures/agent-kafka-essink.md) to write from Kafka to {{es}}<br>* [{{ls}} to read from Kafka and route to {{es}}](./ingest-reference-architectures/agent-kafka-ls.md)<br> |
| [*{{ls}} to Elasticsearch*](./ingest-reference-architectures/ls-for-input.md)<br><br>![Image showing {{ls}} collecting data and sending to {{es}}](/manage-data/images/ingest-ls-es.png "") | You need to collect data from a source that {{agent}} can’t read (such as databases, AWS Kinesis). Check out the [{{ls}} input plugins](logstash-docs-md://lsr/input-plugins.md).<br> |
| [*Elastic air-gapped architectures*](./ingest-reference-architectures/airgapped-env.md)<br><br>![Image showing {{stack}} in an air-gapped environment](/manage-data/images/ingest-ea-airgapped.png "") | You want to deploy {{agent}} and {{stack}} in an air-gapped environment (no access to outside networks)<br> |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ products:
:::

Ingest models
: [{{agent}} to {{ls}} to Kafka to {{ls}} to {{es}}: Kafka as middleware message queue](agent-kafka-ls.md).<br> {{ls}} reads data from Kafka and routes it to {{es}} clusters (and/or other destinations).
: [{{agent}} to {{ls}} to Kafka to {{ls}} to {{es}}: Kafka as middleware message queue](agent-kafka-ls.md).<br> {{ls}} reads data from Kafka and routes it to {{es}} clusters and other destinations.

[{{agent}} to {{ls}} to Kafka to Kafka ES Sink to {{es}}: Kafka as middleware message queue](agent-kafka-essink.md).<br> Kafka ES sink connector reads from Kafka and writes to {{es}}.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ products:
:::

Ingest model
: {{ls}} to collect data from sources not currently supported by {{agent}} and sending the data to {{es}}. Note that the data transformation still happens within the {{es}} ingest pipeline.
: {{ls}} to collect data from sources not currently supported by {{agent}} and sending the data to {{es}}. The data transformation still happens within the {{es}} ingest pipeline.

Use when
: {{agent}} doesn’t currently support your data source.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@ products:
:::

Ingest model
: {{agent}} to {{ls}} to {{es}} clusters and/or additional destinations
: {{agent}} to {{ls}} to {{es}} clusters and additional destinations

Use when
: Data collected by {{agent}} needs to be routed to different {{es}} clusters or non-{{es}} destinations depending on the content

Example
: Let’s take an example of a Windows workstation, for which we are collecting different types of logs using the System and Windows integrations. These logs need to be sent to different {{es}} clusters and to S3 for backup and a mechanism to send it to other destinations such as different SIEM solutions. In addition, the {{es}} destination is derived based on the type of datastream and an organization identifier.

In such use cases, agents send the data to {{ls}} as a routing mechanism to different destinations. Note that the System and Windows integrations must be installed on all {{es}} clusters to which the data is routed.
In such use cases, agents send the data to {{ls}} as a routing mechanism to different destinations. The System and Windows integrations must be installed on all {{es}} clusters to which the data is routed.


Sample config
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Ingest model
: {{agent}} to {{ls}} persistent queue to {{es}}

Use when
: Your data flow may encounter network issues, bursts of events, and/or downstream unavailability and you need the ability to buffer the data before ingestion.
: Your data flow may encounter network issues, bursts of events, or downstream unavailability, and you need the ability to buffer the data before ingestion.


## Resources [lspq-resources]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ For this example, let’s create a new database *es_db* with table *es_table*, a

There are two possible ways to address this:

* You can use "soft deletes" in your source database. Essentially, a record is first marked for deletion through a boolean flag. Other programs that are currently using your source database would have to filter out "soft deletes" in their queries. The "soft deletes" are sent over to Elasticsearch, where they can be processed. After that, your source database and Elasticsearch must both remove these "soft deletes."
* You can use "soft deletes" in your source database. Essentially, a record is first marked for deletion through a boolean flag. Other programs that are currently using your source database would have to filter out "soft deletes" in their queries. The "soft deletes" are sent over to Elasticsearch, where they can be processed. After that, your source database and Elasticsearch must both remove these "soft deletes".
* You can periodically clear the Elasticsearch indices that are based off of the database, and then refresh Elasticsearch with a fresh ingest of the contents of the database.

3. Log in to your MySQL server and add three records to your new database:
Expand All @@ -122,7 +122,7 @@ For this example, let’s create a new database *es_db* with table *es_table*, a
(3,"Stark");
```

4. Verify your data with a SQL statement:
4. Verify your data with an SQL statement:

```txt
select * from es_table;
Expand Down Expand Up @@ -364,7 +364,7 @@ In this section, we configure Logstash to send the MySQL data to Elasticsearch.
}
```

4. At this point, if you simply restart Logstash as is with your new output, then no MySQL data is sent to our Elasticsearch index.
4. If you simply restart Logstash as is with your new output, then no MySQL data is sent to our Elasticsearch index.

Why? Logstash retains the previous `sql_last_value` timestamp and sees that no new changes have occurred in the MySQL database since that time. Therefore, based on the SQL query that we configured, there’s no new data to send to Logstash.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ For the three following packages, you can create a working directory to install
npm install @elastic/ecs-winston-format
```

* [Got](https://www.npmjs.com/package/got): Got is a "Human-friendly and powerful HTTP request library for Node.js." - this plugin can be used to query the sample web server used in the tutorial. To install the Got package, run the following command in your working directory:
* [Got](https://www.npmjs.com/package/got): Got is a "Human-friendly and powerful HTTP request library for Node.js" - this plugin can be used to query the sample web server used in the tutorial. To install the Got package, run the following command in your working directory:

```sh
npm install got
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ In this step, you’ll create a Python script that generates logs in JSON format

This Python script randomly generates one of twelve log messages, continuously, at a random interval of between 1 and 10 seconds. The log messages are written to an `elvis.json` file, each with a timestamp, a log level of _info_, _warning_, _error_, or _critical_, and other data. To add some variance to the log data, the _info_ message _Elvis has left the building_ is set to be the most probable log event.

For simplicity, there is just one log file (`elvis.json`), and it is written to the local directory where `elvis.py` is located. In a production environment, you may have multiple log files associated with different modules and loggers and likely stored in `/var/log` or similar. To learn more about configuring logging in Python, check [Logging facility for Python](https://docs.python.org/3/library/logging.html).
For simplicity, there is only one log file (`elvis.json`), and it is written to the local directory where `elvis.py` is located. In a production environment, you may have multiple log files associated with different modules and loggers and likely stored in `/var/log` or similar. To learn more about configuring logging in Python, check [Logging facility for Python](https://docs.python.org/3/library/logging.html).

Having your logs written in a JSON format with ECS fields allows for easy parsing and analysis, and for standardization with other applications. A standard, easily parsable format becomes increasingly important as the volume and type of data captured in your logs expands over time.

Expand All @@ -127,7 +127,7 @@ To connect to your {{ech}} deployment, stream data, and issue queries, you have

### Cloud ID

To find the [Cloud ID](/deploy-manage/deploy/elastic-cloud/find-cloud-id.md) of your deployment, go to the {{kib}} main menu, then select **Management** → **Integrations** → **Connection details**. Note that the Cloud ID value is in the format `deployment-name:hash`. Save this value to use it later.
To find the [Cloud ID](/deploy-manage/deploy/elastic-cloud/find-cloud-id.md) of your deployment, go to the {{kib}} main menu, then select **Management** → **Integrations** → **Connection details**. The Cloud ID value is in the format `deployment-name:hash`. Save this value to use it later.

### Basic authentication

Expand Down Expand Up @@ -203,7 +203,7 @@ cloud.id: deployment-name:hash <1>
cloud.auth: username:password <2>
```

1. Uncomment the `cloud.id` line, and add the deployments Cloud ID as the key's value. Note that the `cloud.id` value is in the format `deployment-name:hash`. Find your Cloud ID by going to the {{kib}} main menu, and selecting **Management** → **Integrations** → **Connection details**.
1. Uncomment the `cloud.id` line, and add the deployment's Cloud ID as the key's value. The `cloud.id` value is in the format `deployment-name:hash`. Find your Cloud ID by going to the {{kib}} main menu, and selecting **Management** → **Integrations** → **Connection details**.
2. Uncomment the `cloud.auth` line, and add the username and password for your deployment in the format `username:password`. For example, `cloud.auth: elastic:57ugj782kvkwmSKg8uVe`.

::::{note}
Expand Down
2 changes: 1 addition & 1 deletion manage-data/ingest/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ products:
% - [x] https://www.elastic.co/customer-success/data-ingestion
% - [x] https://github.com/elastic/ingest-docs/pull/1373

% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc):
% Internal links rely on the following IDs being on this page (for example, as a heading ID, paragraph ID, and so on):
% These IDs are from content that I'm not including on this current page. I've resolved them by changing the internal links to anchor links where needed. - Wajiha

$$$supported-outputs-beats-and-agent$$$
Expand Down
4 changes: 2 additions & 2 deletions manage-data/ingest/transform-enrich.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ According to your use case, you may want to control the structure of your ingest

Finally, to help ensure optimal query results, you may want to customize how text is analyzed and how text fields are defined inside {{es}}.

Note that you can also perform transforms on existing {{es}} indices to pivot data into a summarized format, for example to break down web requests by geography or browser type. To learn more, refer to [Transforming data](../../explore-analyze/transforms.md).
You can also perform transforms on existing {{es}} indices to pivot data into a summarized format, for example to break down web requests by geography or browser type. To learn more, refer to [Transforming data](../../explore-analyze/transforms.md).

{{agent}} processors
: You can use [{{agent}} processors](/reference/fleet/agent-processors.md) to sanitize or enrich raw data at the source. Use {{agent}} processors if you need to control what data is sent across the wire, or if you need to enrich the raw data with information available on the host.
Expand Down Expand Up @@ -49,7 +49,7 @@ Index mapping

: Ingested data can be mapped dynamically, where {{es}} adds all fields automatically based on the detected data types, or explicitly, where {{es}} maps the incoming data to fields based on your custom rules.

: You can use {{es}} [runtime fields](../data-store/mapping/runtime-fields.md) to define or alter the schema at query time. You can start working with your data without needing to understand how it is structured, add fields to existing documents without reindexing your data, override the value returned from an indexed field, and/or define fields for a specific use without modifying the underlying schema.
: You can use {{es}} [runtime fields](../data-store/mapping/runtime-fields.md) to define or alter the schema at query time. You can start working with your data without needing to understand how it is structured, add fields to existing documents without reindexing your data, override the value returned from an indexed field, and define fields for a specific use without modifying the underlying schema.

: Refer to the [Index mapping](../data-store/mapping.md) pages to learn about the dynamic mapping rules that {{es}} runs by default, which ones you can customize, and how to configure your own explicit data to field mappings.

Expand Down
4 changes: 2 additions & 2 deletions manage-data/ingest/transform-enrich/error-handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ applies_to:
Ingest pipelines in Elasticsearch are powerful tools for transforming and enriching data before indexing. However, errors can occur during processing. This guide outlines strategies for handling such errors effectively.

:::{important}
Ingest pipelines are executed before the document is indexed by Elasticsearch. You can handle the errors occurring while processing the document (i.e. transforming the json object) but not the errors triggered while indexing like mapping conflict. For this is the Elasticsearch Failure Store.
Ingest pipelines are executed before the document is indexed by Elasticsearch. You can handle the errors occurring while processing the document (that is, transforming the json object) but not the errors triggered while indexing like mapping conflict. For this is the Elasticsearch Failure Store.
:::

Errors in ingest pipelines typically fall into the following categories:
Expand All @@ -23,7 +23,7 @@ Create an `error-handling-pipeline` that sets `event.kind` to `pipeline_error` a

The `on_failure` parameter can be defined either for individual processors or at the pipeline level to catch exceptions that may occur during document processing. The `ignore_failure` option allows a specific processor to silently skip errors without affecting the rest of the pipeline.

## Global vs. processor-specific
## Global versus processor-specific

The following example demonstrates how to use the `on_failure` handler at the pipeline level rather than within individual processors. While this approach ensures the pipeline exits gracefully on failure, it also means that processing stops at the point of error.

Expand Down
2 changes: 1 addition & 1 deletion manage-data/ingest/transform-enrich/ingest-pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -463,7 +463,7 @@ PUT _ingest/pipeline/my-pipeline

### Classic field access pattern [access-source-pattern-classic]

The `classic` access pattern is the default access pattern that has been around since ingest node first released. Field paths given to processors (e.g. `event.tags.ingest.processed_by`) are split on the dot character (`.`). The processor then uses the resulting field names to traverse the document until a value is found. When writing a value to a document, if its parent fields do not exist in the source, the processor will create nested objects for the missing fields.
The `classic` access pattern is the default access pattern that has been around since ingest node first released. Field paths given to processors (for example, `event.tags.ingest.processed_by`) are split on the dot character (`.`). The processor then uses the resulting field names to traverse the document until a value is found. When writing a value to a document, if its parent fields do not exist in the source, the processor will create nested objects for the missing fields.

```console
POST /_ingest/pipeline/_simulate
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -366,7 +366,7 @@ POST _ingest/pipeline/_simulate
```

:::{tip}
After storing values as bytes, you can use Kibana's field formatting to display them in a human-friendly format (KB, MB, GB, etc.) without changing the underlying data.
After storing values as bytes, you can use Kibana's field formatting to display them in a human-friendly format (KB, MB, GB, and so on) without changing the underlying data.
:::

## Rename fields
Expand Down
Loading
Loading