You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: manage-data/ingest/transform-enrich/readable-maintainable-ingest-pipelines.md
+1-272Lines changed: 1 addition & 272 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,16 +23,14 @@ When creating ingest pipelines, there are are few options for accessing fields i
23
23
| Dot notation |`ctx.event.action`| Supported in conditionals and painless scripts. |
24
24
| Square bracket notation |`ctx['event']['action']`| Supported in conditionals and painless scripts. |
25
25
| Mixed dot and bracket notation |`ctx.event['action']`| Supported in conditionals and painless scripts. |
26
-
| Getter |`$('event.action', null);`| Only supported in painless scripts. |
27
26
28
27
Below are some general guidelines for choosing the right option in a situation.
29
28
30
29
### Dot notation [dot-notation]
31
30
32
31
**Benefits:**
33
32
* Clean and easy to read.
34
-
* Supports null safety operations `?`.
35
-
For example, ...
33
+
* Supports null safety operations `?`. Read more in [Use null safe operators (`?.`)](#null-safe-operators).
36
34
37
35
**Limitations**
38
36
* Does not support field names that contain a `.` or any special characters such as `@`.
@@ -60,64 +58,6 @@ Below are some general guidelines for choosing the right option in a situation.
60
58
**Limitations:**
61
59
* Slightly more difficult to read.
62
60
63
-
### Getter
64
-
65
-
Within a script there are the same two possibilities to access fields as above. As well as the new `getter`. This only works in the painless scripts in an ingest pipeline.
66
-
67
-
% For example, take the following input:
68
-
%
69
-
% ```json
70
-
% {
71
-
% "_source": {
72
-
% "user_name": "philipp"
73
-
% }
74
-
% }
75
-
% ```
76
-
%
77
-
% When you want to set the `user.name` field with a script:
78
-
%
79
-
% - `ctx.user.name = ctx.user_name`
80
-
%
81
-
% This works as long as `user_name` is populated. If it is null you will get `null` as value. Additionally, when the `user` object does not exist, it will error because Java needs you to define the `user` object first before adding a key `name` into it. We cover the `new HashMap()` further down.
82
-
%
83
-
% This is one of the alternatives to get it working when you only want to set it, if it is not null
84
-
%
85
-
% ```painless
86
-
% if (ctx.user_name != null) {
87
-
% ctx.user = new HashMap();
88
-
% ctx.user.name = ctx.user_name;
89
-
% }
90
-
% ```
91
-
%
92
-
% This works fine, as you now check for null.
93
-
%
94
-
% However there is also an easier to write and maintain alternative available:
95
-
%
96
-
% - `ctx.user.name = $('user_name', null);`
97
-
%
98
-
% This $('field', 'fallback') allows you to specify the field without the `CTX` for walking. You can even supply % `$('this.very.nested.field.is.super.far.away', null)` when you need to. The fallback is in case the field is % null. This comes in very handy when needing to do certain manipulation of data. Let's say you want to lowercase all the field names, you can simply write this now:
% You see that I switched up the null value to an empty String. Since the String has the `toLowerCase()` function. This of course works with all types. Bit of a silly thing, since you could simply write `object.abc` as the field value. As an example you can see that we can even create a map, list, array, whatever you want.
103
-
%
104
-
% - `if ($('object', {}).containsKey('abc')){}`
105
-
%
106
-
% One common thing I use it for is when dealing with numbers and casting. The field specifies the usage in `%`, however Elasticsearch doesn't like this, or better to say Kibana renders % as `0-1` for `0%-100%` and not `0-100`. `100` is equal to `10.000%`
107
-
%
108
-
% - field: `cpu_usage = 100.00`
109
-
% - `ctx.cpu.usage = $('cpu_usage',0.0)/100`
110
-
%
111
-
% This allows me to always set the `cpu.usage` field and not to worry about it, have an always working division. One other way to leverage this, in a simpler script is like this, but most scripts are rather complex so this is not that often applicable.
@@ -151,73 +91,6 @@ This example only checks for exact matches. Do not use this approach if you need
151
91
152
92
Anticipate potential problems with the data, and use the [null safe operator](elasticsearch://reference/scripting-languages/painless/painless-operators-reference.md#null-safe-operator) (`?.`) to prevent data from being processed incorrectly.
153
93
154
-
In simplest case the `ignore_missing` parameter is available in most processors to handle fields without values. Or the `ignore_failure` parameter to let the processor fail without impacting the pipeline you but sometime you will need to use the [null safe operator `?.`](elasticsearch://reference/scripting-languages/painless/painless-operators-reference.md#null-safe-operator) to check if a field exists and is not `null`.
155
-
156
-
```json
157
-
POST _ingest/pipeline/_simulate
158
-
{
159
-
"docs": [
160
-
{
161
-
"_source": {
162
-
"host": {
163
-
"hostname": "test"
164
-
},
165
-
"ip": "127.0.0.1"
166
-
}
167
-
},
168
-
{
169
-
"_source": {
170
-
"ip": "127.0.0.1"
171
-
}
172
-
}
173
-
],
174
-
"pipeline": {
175
-
"processors": [
176
-
{
177
-
"set": {
178
-
"field": "a",
179
-
"value": "b",
180
-
"if": "ctx.host?.hostname == 'test'"
181
-
}
182
-
}
183
-
]
184
-
}
185
-
}
186
-
```
187
-
188
-
This pipeline will work in both cases because `host?` checks if `host` exists and if not returns `null`. Removing `?` from the `if` condition will fail the second document with an error message: `cannot access method/field [hostname] from a null def reference`
189
-
190
-
The null operator `?` is actually doing this behind the scene:
Then the ? will transform this simple if statement to this:
197
-
198
-
```painless
199
-
ctx.windows != null &&
200
-
ctx.windows.event != null &&
201
-
ctx.windows.event.data != null &&
202
-
ctx.windows.event.data.user != null &&
203
-
ctx.windows.event.data.user.name == "philipp"
204
-
```
205
-
206
-
You can use the null safe operator with function too:
207
-
208
-
-`ctx.message?.startsWith('shoe')`
209
-
210
-
An [elvis](https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-operators-reference.html#elvis-operator) might be useful in your script to handle these maybe null value:
The reason for that is, if `event.category` is a number, object or anything other than a `String` then it does not have the `startsWith` function and therefore will error with function `startsWith` not available on type object.
220
-
221
94
:::{tip}
222
95
It is not necessary to use a null safe operator for first level objects
223
96
(for example, use `ctx.openshift` instead of `ctx?.openshift`).
1. Only if there's a `ctx.openshift` and a `ctx.openshift.origin` will it check for a `ctx.openshift.origin.threadId` and make sure it is a string.
246
119
247
-
#### Use the `containsKey`
248
-
249
-
The `containsKey` can be used to check if a map contains a specific key.
250
-
251
-
```json
252
-
POST _ingest/pipeline/_simulate
253
-
{
254
-
"docs": [
255
-
{
256
-
"_source": {
257
-
"ip": "127.0.0.1"
258
-
}
259
-
},
260
-
{
261
-
"_source": {
262
-
"test": "asd"
263
-
}
264
-
}
265
-
],
266
-
"pipeline": {
267
-
"processors": [
268
-
{
269
-
"set": {
270
-
"field": "a",
271
-
"value": "b",
272
-
"if": "ctx.containsKey('test')"
273
-
}
274
-
}
275
-
]
276
-
}
277
-
}
278
-
```
279
-
280
-
:::{warn}
281
-
This is more complex then it seems, since you will end up writing `ctx.kubernetes.containsKey('namespace')`. If `kubernetes` is null, or comes in as a String it will break processing. Stick to the null safe operator `?` for most work.
282
-
{warn}
283
-
284
120
### Use null safe operators when checking type
285
121
286
122
If you're using a null safe operator, it will return the value if it is not `null` so there is no reason to check whether a value is not `null` before checking the type of that value.
@@ -513,54 +349,6 @@ The [rename processor](elasticsearch://reference/enrich-processor/rename-process
513
349
514
350
If no built-in processor can achieve your goal, you may need to use a [script processor](elasticsearch://reference/enrich-processor/script-processor.md) in your ingest pipeline. Be sure to write scripts that are clear, concise, and maintainable.
515
351
516
-
### Setting the value of a field
517
-
518
-
Sometimes it is needed to write to a field and this field does not exist yet. Whenever the object above it exists, this can be done immediately.
519
-
520
-
`ctx.abc = “cool”` works without any issue as we are adding a root field called `abc`.
521
-
522
-
Creating something like `ctx.abc.def = “cool”` does not work unless you create the `abc` object beforehand or it already exists. There are multiple ways to do it. What we always or usually want to create is a Map. We can do it in a couple of ways:
523
-
524
-
```painless
525
-
ctx.abc = new HashMap();
526
-
ctx.abc = [:];
527
-
```
528
-
529
-
Both options are valid and do the same thing. However there is a big caveat and that is, that if `abc` already exists, it will be overwritten and empty. Validating if `abc` already exists can be done by:
530
-
531
-
```painless
532
-
if(ctx.abc == null) {
533
-
ctx.abc = [:];
534
-
}
535
-
```
536
-
537
-
With a simple `if ctx.abc == null` we know that `abc` does not exist and we can create it. Alternatively you can use the shorthand which is super helpful when you need to go 2,3,4 levels deep. You can use either version with the `HashMap()` or with the `[:]`.
538
-
539
-
```painless
540
-
ctx.putIfAbsent("abc", new HashMap());
541
-
ctx.putIfAbsent("abc", [:]);
542
-
```
543
-
544
-
Now assuming you want to create this structure:
545
-
546
-
```json
547
-
{
548
-
"user": {
549
-
"geo": {
550
-
"city": "Amsterdam"
551
-
}
552
-
}
553
-
}
554
-
```
555
-
556
-
The `putIfAbsent` will help a ton here:
557
-
558
-
```painless
559
-
ctx.putIfAbsent("user", [:]);
560
-
ctx.user.putIfAbsent("geo", [:]);
561
-
ctx.user.geo = "Amsterdam"
562
-
```
563
-
564
352
### Calculate `event.duration` in a complex manner
565
353
566
354
#### **Don't**: Use verbose and error-prone scripting patterns
@@ -622,37 +410,6 @@ POST _ingest/pipeline/_simulate
622
410
3. Store the duration in nanoseconds, as expected by ECS.
623
411
4. Use the null safe operator to check for field existence.
624
412
625
-
#### Calculate time in other timezone
626
-
627
-
When you cannot use the date and its timezone parameter, you can use `datetime` in Painless
### Stitch together IP addresses in a script processor
657
414
658
415
When reconstructing or normalizing IP addresses in ingest pipelines, avoid unnecessary complexity and redundant operations.
@@ -789,31 +546,3 @@ POST _ingest/pipeline/_simulate
789
546
```
790
547
791
548
In this example, `{{tags.0}}` retrieves the first element of the `tags` array (`"cool-host"`) and assigns it to the `host.alias` field. This approach is necessary when you want to extract a specific value from an array for use elsewhere in your document. Using the correct index ensures you get the intended value, and this pattern works for any array field in your source data.
792
-
793
-
#### Work with JSON as value of fields
794
-
795
-
It is possible to work with json string as value of a field for example to set the `original` field value with the json of `_source`: We are leveraging a `mustache` function here.
0 commit comments