Skip to content
Merged
16 changes: 13 additions & 3 deletions docs/reference/enrich-processor/date-processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ $$$date-options$$$
| `field` | yes | - | The field to get the date from. |
| `target_field` | no | @timestamp | The field that will hold the parsed date. |
| `formats` | yes | - | An array of the expected date formats. Can be a [java time pattern](/reference/elasticsearch/mapping-reference/mapping-date-format.md) or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. |
| `timezone` | no | UTC | The default timezone used by the processor (see below). Supports [template snippets](docs-content://manage-data/ingest/transform-enrich/ingest-pipelines.md#template-snippets). |
| `timezone` | no | UTC | The default timezone used by the processor (see [below](#date-processor-timezones)). Supports [template snippets](docs-content://manage-data/ingest/transform-enrich/ingest-pipelines.md#template-snippets). |
| `locale` | no | ENGLISH | The locale to use when parsing the date, relevant when parsing month names or week days. Supports [template snippets](docs-content://manage-data/ingest/transform-enrich/ingest-pipelines.md#template-snippets). |
| `output_format` | no | `yyyy-MM-dd'T'HH:mm:ss.SSSXXX` | The format to use when writing the date to `target_field`. Must be a valid [java time pattern](/reference/elasticsearch/mapping-reference/mapping-date-format.md). |
| `description` | no | - | Description of the processor. Useful for describing the purpose of the processor or its configuration. |
Expand All @@ -24,14 +24,20 @@ $$$date-options$$$
| `on_failure` | no | - | Handle failures for the processor. See [Handling pipeline failures](docs-content://manage-data/ingest/transform-enrich/ingest-pipelines.md#handling-pipeline-failures). |
| `tag` | no | - | Identifier for the processor. Useful for debugging and metrics. |

## Timezones [date-processor-timezones]

The `timezone` option may have two effects on the behavior of the processor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: If you gave this section a heading you could link to it from the table instead of writing (see below)

Looking at the URL preview, there are no subheadings at all on this page which makes it hard to scan :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a good idea. I've just added a bunch of section headings, and turned both the belows into links. (I previewed the docs locally and the cross-references work as expected, so hopefully I got the format right.)

I've still left it saying 'see below', because I needed something as an anchor text. If there's a preference for a different style, please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Made a suggestion to incorporate link into the text more naturally :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah, that's better. Done. (I somehow mashed the wrong button in github and failed to apply your suggestion, but I've pushed a commit with the exact same change!)

- If the string being parsed matches a format representing a local date-time, such as `yyyy-MM-dd HH:mm:ss`, it will be assumed to be in the timezone specified by this option. This is not applicable if the string matches a format representing a zoned date-time, such as `yyyy-MM-dd HH:mm:ss zzz`: in that case, the timezone parsed from the string will be used. It is also not applicable if the string matches an absolute time format, such as `epoch_millis`.
- The date-time will be converted into the timezone given by this option before it is formatted and written into the target field. This is not applicable if the `output_format` is an absolute time format such as `epoch_millis`.

::::{warning}
We recommend avoiding the use of short abbreviations for timezone names, since they can be ambiguous. For example, one JDK might interpret `PST` as `America/Tijuana`, i.e. Pacific (Standard) Time, while another JDK might interpret it as `Asia/Manila`, i.e. Philippine Standard Time. If your input data contains such abbreviations, you should convert them into either standard full names or UTC offsets using your own knowledge of what each abbreviation means in your data before parsing them. See below for an example. (This does not apply to `UTC`, which is safe.)
We recommend avoiding the use of short abbreviations for timezone names, since they can be ambiguous. For example, one JDK might interpret `PST` as `America/Tijuana`, i.e. Pacific (Standard) Time, while another JDK might interpret it as `Asia/Manila`, i.e. Philippine Standard Time. If your input data contains such abbreviations, you should convert them into either standard full names or UTC offsets using your own knowledge of what each abbreviation means in your data before parsing them. See [below](#date-processor-short-timezone-example) for an example. (This does not apply to `UTC`, which is safe.)
::::

## Examples [date-processor-examples]

### Simple example [date-processor-simple-example]

Here is an example that adds the parsed date to the `timestamp` field based on the `initial_date` field:

```js
Expand All @@ -50,6 +56,8 @@ Here is an example that adds the parsed date to the `timestamp` field based on t
}
```

### Example using templated parameters [date-processor-templated-example]

The `timezone` and `locale` processor parameters are templated. This means that their values can be extracted from fields within documents. The example below shows how to extract the locale/timezone details from existing fields, `my_timezone` and `my_locale`, in the ingested document that contain the timezone and locale values.

```js
Expand All @@ -69,6 +77,8 @@ The `timezone` and `locale` processor parameters are templated. This means that
}
```

### Example dealing with short timezone abbreviations safely [date-processor-short-timezone-example]

In the example below, the `message` field in the input is expected to be a string formed of a local date-time in `yyyyMMddHHmmss` format, a timezone abbreviated to one of `PST`, `CET`, or `JST` representing Pacific, Central European, or Japan time, and a payload. This field is split up using a `grok` processor, then the timezones are converted into full names using a `script` processor, then the date-time is parsed using a `date` processor, and finally the unwanted fields are discarded using a `drop` processor.

```js
Expand Down Expand Up @@ -112,4 +122,4 @@ In the example below, the `message` field in the input is expected to be a strin
}
```

With that pipeline, a `message` field with the value `20250102123456 PST Hello world` will result in a `@timestamp` field with the value `2025-01-02T12:34:56.000-08:00` and a `payload` field with the value `Hello world`. (Note: A `@timestamp` field will normally be mapped to a `date` type, and therefore it will be indexed as an integer representing milliseconds since the epoch, although the original format and timezone may be preserved in the `_source`.)
With this pipeline, a `message` field with the value `20250102123456 PST Hello world` will result in a `@timestamp` field with the value `2025-01-02T12:34:56.000-08:00` and a `payload` field with the value `Hello world`. (Note: A `@timestamp` field will normally be mapped to a `date` type, and therefore it will be indexed as an integer representing milliseconds since the epoch, although the original format and timezone may be preserved in the `_source`.)
Loading