You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Improve the docs around `remove_binary` in `attachment`
Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
(cherry picked from commit bc25a73)
# Conflicts:
# modules/ingest-attachment/src/main/java/org/elasticsearch/ingest/attachment/AttachmentProcessor.java
Copy file name to clipboardExpand all lines: docs/reference/ingest/processors/attachment.asciidoc
+68-23Lines changed: 68 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,15 +19,15 @@ representation. The processor will skip the base64 decoding then.
19
19
.Attachment options
20
20
[options="header"]
21
21
|======
22
-
| Name | Required | Default | Description
23
-
| `field` | yes | - | The field to get the base64 encoded field from
24
-
| `target_field` | no | attachment | The field that will hold the attachment information
25
-
| `indexed_chars` | no | 100000 | The number of chars being used for extraction to prevent huge fields. Use `-1` for no limit.
26
-
| `indexed_chars_field` | no | `null` | Field name from which you can overwrite the number of chars being used for extraction. See `indexed_chars`.
27
-
| `properties` | no | all properties | Array of properties to select to be stored. Can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language`
28
-
| `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document
29
-
| `remove_binary` | no | `false` | If `true`, the binary `field` will be removed from the document
30
-
| `resource_name` | no | | Field containing the name of the resource to decode. If specified, the processor passes this resource name to the underlying Tika library to enable https://tika.apache.org/1.24.1/detection.html#Resource_Name_Based_Detection[Resource Name Based Detection].
22
+
| Name | Required | Default | Description
23
+
| `field` | yes | - | The field to get the base64 encoded field from
24
+
| `target_field` | no | attachment | The field that will hold the attachment information
25
+
| `indexed_chars` | no | 100000 | The number of chars being used for extraction to prevent huge fields. Use `-1` for no limit.
26
+
| `indexed_chars_field` | no | `null` | Field name from which you can overwrite the number of chars being used for extraction. See `indexed_chars`.
27
+
| `properties` | no | all properties | Array of properties to select to be stored. Can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language`
28
+
| `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document
29
+
| `remove_binary` | encouraged | `false` | If `true`, the binary `field` will be removed from the document. This option is not required, but setting it explicitly is encouraged, and omitting it will result in a warning.
30
+
| `resource_name` | no | | Field containing the name of the resource to decode. If specified, the processor passes this resource name to the underlying Tika library to enable https://tika.apache.org/1.24.1/detection.html#Resource_Name_Based_Detection[Resource Name Based Detection].
31
31
|======
32
32
33
33
[discrete]
@@ -58,7 +58,7 @@ PUT _ingest/pipeline/attachment
58
58
{
59
59
"attachment" : {
60
60
"field" : "data",
61
-
"remove_binary": false
61
+
"remove_binary": true
62
62
}
63
63
}
64
64
]
@@ -82,7 +82,6 @@ The document's `attachment` object contains extracted properties for the file:
0 commit comments