Skip to content

Commit 6f7c288

Browse files
remove content
1 parent 609696c commit 6f7c288

File tree

1 file changed

+1
-272
lines changed

1 file changed

+1
-272
lines changed

manage-data/ingest/transform-enrich/readable-maintainable-ingest-pipelines.md

Lines changed: 1 addition & 272 deletions
Original file line numberDiff line numberDiff line change
@@ -23,16 +23,14 @@ When creating ingest pipelines, there are are few options for accessing fields i
2323
| Dot notation | `ctx.event.action` | Supported in conditionals and painless scripts. |
2424
| Square bracket notation | `ctx['event']['action']` | Supported in conditionals and painless scripts. |
2525
| Mixed dot and bracket notation | `ctx.event['action']` | Supported in conditionals and painless scripts. |
26-
| Getter | `$('event.action', null);` | Only supported in painless scripts. |
2726

2827
Below are some general guidelines for choosing the right option in a situation.
2928

3029
### Dot notation [dot-notation]
3130

3231
**Benefits:**
3332
* Clean and easy to read.
34-
* Supports null safety operations `?`.
35-
For example, ...
33+
* Supports null safety operations `?`. Read more in [Use null safe operators (`?.`)](#null-safe-operators).
3634

3735
**Limitations**
3836
* Does not support field names that contain a `.` or any special characters such as `@`.
@@ -60,64 +58,6 @@ Below are some general guidelines for choosing the right option in a situation.
6058
**Limitations:**
6159
* Slightly more difficult to read.
6260

63-
### Getter
64-
65-
Within a script there are the same two possibilities to access fields as above. As well as the new `getter`. This only works in the painless scripts in an ingest pipeline.
66-
67-
% For example, take the following input:
68-
%
69-
% ```json
70-
% {
71-
% "_source": {
72-
% "user_name": "philipp"
73-
% }
74-
% }
75-
% ```
76-
%
77-
% When you want to set the `user.name` field with a script:
78-
%
79-
% - `ctx.user.name = ctx.user_name`
80-
%
81-
% This works as long as `user_name` is populated. If it is null you will get `null` as value. Additionally, when the `user` object does not exist, it will error because Java needs you to define the `user` object first before adding a key `name` into it. We cover the `new HashMap()` further down.
82-
%
83-
% This is one of the alternatives to get it working when you only want to set it, if it is not null
84-
%
85-
% ```painless
86-
% if (ctx.user_name != null) {
87-
% ctx.user = new HashMap();
88-
% ctx.user.name = ctx.user_name;
89-
% }
90-
% ```
91-
%
92-
% This works fine, as you now check for null.
93-
%
94-
% However there is also an easier to write and maintain alternative available:
95-
%
96-
% - `ctx.user.name = $('user_name', null);`
97-
%
98-
% This $('field', 'fallback') allows you to specify the field without the `CTX` for walking. You can even supply % `$('this.very.nested.field.is.super.far.away', null)` when you need to. The fallback is in case the field is % null. This comes in very handy when needing to do certain manipulation of data. Let's say you want to lowercase all the field names, you can simply write this now:
99-
%
100-
% - `ctx.user.name = $('user_name','').toLowerCase();`
101-
%
102-
% You see that I switched up the null value to an empty String. Since the String has the `toLowerCase()` function. This of course works with all types. Bit of a silly thing, since you could simply write `object.abc` as the field value. As an example you can see that we can even create a map, list, array, whatever you want.
103-
%
104-
% - `if ($('object', {}).containsKey('abc')){}`
105-
%
106-
% One common thing I use it for is when dealing with numbers and casting. The field specifies the usage in `%`, however Elasticsearch doesn't like this, or better to say Kibana renders % as `0-1` for `0%-100%` and not `0-100`. `100` is equal to `10.000%`
107-
%
108-
% - field: `cpu_usage = 100.00`
109-
% - `ctx.cpu.usage = $('cpu_usage',0.0)/100`
110-
%
111-
% This allows me to always set the `cpu.usage` field and not to worry about it, have an always working division. One other way to leverage this, in a simpler script is like this, but most scripts are rather complex so this is not that often applicable.
112-
%
113-
% ```json
114-
% {
115-
% "script": {
116-
% "source": "ctx.abc = ctx.def",
117-
% "if": "ctx.def != null"
118-
% }
119-
% }
120-
% ```
12161

12262
## Write concise conditionals (`if` statements) [conditionals]
12363

@@ -151,73 +91,6 @@ This example only checks for exact matches. Do not use this approach if you need
15191

15292
Anticipate potential problems with the data, and use the [null safe operator](elasticsearch://reference/scripting-languages/painless/painless-operators-reference.md#null-safe-operator) (`?.`) to prevent data from being processed incorrectly.
15393

154-
In simplest case the `ignore_missing` parameter is available in most processors to handle fields without values. Or the `ignore_failure` parameter to let the processor fail without impacting the pipeline you but sometime you will need to use the [null safe operator `?.`](elasticsearch://reference/scripting-languages/painless/painless-operators-reference.md#null-safe-operator) to check if a field exists and is not `null`.
155-
156-
```json
157-
POST _ingest/pipeline/_simulate
158-
{
159-
"docs": [
160-
{
161-
"_source": {
162-
"host": {
163-
"hostname": "test"
164-
},
165-
"ip": "127.0.0.1"
166-
}
167-
},
168-
{
169-
"_source": {
170-
"ip": "127.0.0.1"
171-
}
172-
}
173-
],
174-
"pipeline": {
175-
"processors": [
176-
{
177-
"set": {
178-
"field": "a",
179-
"value": "b",
180-
"if": "ctx.host?.hostname == 'test'"
181-
}
182-
}
183-
]
184-
}
185-
}
186-
```
187-
188-
This pipeline will work in both cases because `host?` checks if `host` exists and if not returns `null`. Removing `?` from the `if` condition will fail the second document with an error message: `cannot access method/field [hostname] from a null def reference`
189-
190-
The null operator `?` is actually doing this behind the scene:
191-
192-
Imagine you write this:
193-
194-
- `ctx.windows?.event?.data?.user?.name == "philipp"`
195-
196-
Then the ? will transform this simple if statement to this:
197-
198-
```painless
199-
ctx.windows != null &&
200-
ctx.windows.event != null &&
201-
ctx.windows.event.data != null &&
202-
ctx.windows.event.data.user != null &&
203-
ctx.windows.event.data.user.name == "philipp"
204-
```
205-
206-
You can use the null safe operator with function too:
207-
208-
- `ctx.message?.startsWith('shoe')`
209-
210-
An [elvis](https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-operators-reference.html#elvis-operator) might be useful in your script to handle these maybe null value:
211-
212-
- `ctx.message?.startsWith('shoe') ?: false`
213-
214-
Most safest and secure option is to write:
215-
216-
- `ctx.message instanceof String && ctx.message.startsWith('shoe')`
217-
- `ctx.event?.category instanceof String && ctx.event.category.startsWith('shoe')`
218-
219-
The reason for that is, if `event.category` is a number, object or anything other than a `String` then it does not have the `startsWith` function and therefore will error with function `startsWith` not available on type object.
220-
22194
:::{tip}
22295
It is not necessary to use a null safe operator for first level objects
22396
(for example, use `ctx.openshift` instead of `ctx?.openshift`).
@@ -244,43 +117,6 @@ ctx.openshift?.origin?.threadId instanceof String <1>
244117

245118
1. Only if there's a `ctx.openshift` and a `ctx.openshift.origin` will it check for a `ctx.openshift.origin.threadId` and make sure it is a string.
246119

247-
#### Use the `containsKey`
248-
249-
The `containsKey` can be used to check if a map contains a specific key.
250-
251-
```json
252-
POST _ingest/pipeline/_simulate
253-
{
254-
"docs": [
255-
{
256-
"_source": {
257-
"ip": "127.0.0.1"
258-
}
259-
},
260-
{
261-
"_source": {
262-
"test": "asd"
263-
}
264-
}
265-
],
266-
"pipeline": {
267-
"processors": [
268-
{
269-
"set": {
270-
"field": "a",
271-
"value": "b",
272-
"if": "ctx.containsKey('test')"
273-
}
274-
}
275-
]
276-
}
277-
}
278-
```
279-
280-
:::{warn}
281-
This is more complex then it seems, since you will end up writing `ctx.kubernetes.containsKey('namespace')`. If `kubernetes` is null, or comes in as a String it will break processing. Stick to the null safe operator `?` for most work.
282-
{warn}
283-
284120
### Use null safe operators when checking type
285121

286122
If you're using a null safe operator, it will return the value if it is not `null` so there is no reason to check whether a value is not `null` before checking the type of that value.
@@ -513,54 +349,6 @@ The [rename processor](elasticsearch://reference/enrich-processor/rename-process
513349

514350
If no built-in processor can achieve your goal, you may need to use a [script processor](elasticsearch://reference/enrich-processor/script-processor.md) in your ingest pipeline. Be sure to write scripts that are clear, concise, and maintainable.
515351

516-
### Setting the value of a field
517-
518-
Sometimes it is needed to write to a field and this field does not exist yet. Whenever the object above it exists, this can be done immediately.
519-
520-
`ctx.abc = “cool”` works without any issue as we are adding a root field called `abc`.
521-
522-
Creating something like `ctx.abc.def = “cool”` does not work unless you create the `abc` object beforehand or it already exists. There are multiple ways to do it. What we always or usually want to create is a Map. We can do it in a couple of ways:
523-
524-
```painless
525-
ctx.abc = new HashMap();
526-
ctx.abc = [:];
527-
```
528-
529-
Both options are valid and do the same thing. However there is a big caveat and that is, that if `abc` already exists, it will be overwritten and empty. Validating if `abc` already exists can be done by:
530-
531-
```painless
532-
if(ctx.abc == null) {
533-
ctx.abc = [:];
534-
}
535-
```
536-
537-
With a simple `if ctx.abc == null` we know that `abc` does not exist and we can create it. Alternatively you can use the shorthand which is super helpful when you need to go 2,3,4 levels deep. You can use either version with the `HashMap()` or with the `[:]`.
538-
539-
```painless
540-
ctx.putIfAbsent("abc", new HashMap());
541-
ctx.putIfAbsent("abc", [:]);
542-
```
543-
544-
Now assuming you want to create this structure:
545-
546-
```json
547-
{
548-
"user": {
549-
"geo": {
550-
"city": "Amsterdam"
551-
  }
552-
}
553-
}
554-
```
555-
556-
The `putIfAbsent` will help a ton here:
557-
558-
```painless
559-
ctx.putIfAbsent("user", [:]);
560-
ctx.user.putIfAbsent("geo", [:]);
561-
ctx.user.geo = "Amsterdam"
562-
```
563-
564352
### Calculate `event.duration` in a complex manner
565353

566354
#### ![ ](../../images/icon-cross.svg) **Don't**: Use verbose and error-prone scripting patterns
@@ -622,37 +410,6 @@ POST _ingest/pipeline/_simulate
622410
3. Store the duration in nanoseconds, as expected by ECS.
623411
4. Use the null safe operator to check for field existence.
624412

625-
#### Calculate time in other timezone
626-
627-
When you cannot use the date and its timezone parameter, you can use `datetime` in Painless
628-
629-
```json
630-
POST _ingest/pipeline/_simulate
631-
{
632-
"docs": [
633-
{
634-
"_source": {
635-
"@timestamp": "2021-08-13T09:06:00.000Z"
636-
}
637-
}
638-
],
639-
"pipeline": {
640-
"processors": [
641-
{
642-
"script": {
643-
"source": """
644-
ZonedDateTime zdt = ZonedDateTime.parse(ctx['@timestamp']);
645-
ZonedDateTime zdt_local = zdt.withZoneSameInstant(ZoneId.of('Europe/Berlin'));
646-
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("dd.MM.yyyy - HH:mm:ss Z");
647-
ctx.localtime = zdt_local.format(formatter);
648-
"""
649-
}
650-
}
651-
]
652-
}
653-
}
654-
```
655-
656413
### Stitch together IP addresses in a script processor
657414

658415
When reconstructing or normalizing IP addresses in ingest pipelines, avoid unnecessary complexity and redundant operations.
@@ -789,31 +546,3 @@ POST _ingest/pipeline/_simulate
789546
```
790547

791548
In this example, `{{tags.0}}` retrieves the first element of the `tags` array (`"cool-host"`) and assigns it to the `host.alias` field. This approach is necessary when you want to extract a specific value from an array for use elsewhere in your document. Using the correct index ensures you get the intended value, and this pattern works for any array field in your source data.
792-
793-
#### Work with JSON as value of fields
794-
795-
It is possible to work with json string as value of a field for example to set the `original` field value with the json of `_source`: We are leveraging a `mustache` function here.
796-
797-
```json
798-
POST _ingest/pipeline/_simulate
799-
{
800-
"docs": [
801-
{
802-
"_source": {
803-
"foo": "bar",
804-
"key": 123
805-
}
806-
}
807-
],
808-
"pipeline": {
809-
"processors": [
810-
{
811-
"set": {
812-
"field": "original",
813-
"value": "{{#toJson}}_source{{/toJson}}"
814-
}
815-
}
816-
]
817-
}
818-
}
819-
```

0 commit comments

Comments
 (0)