Skip to content

Commit 3e70f4c

Browse files
cosmo0920alexakreizinger
authored andcommitted
in_tail: Add a description and note for Unicode.Encoding parameter (fluent#1471)
* in_tail: Add a description and note for Unicode.Encoding parameter Signed-off-by: Hiroshi Hatake <[email protected]> * Update pipeline/inputs/tail.md Co-authored-by: Alexa Kreizinger <[email protected]> Signed-off-by: Hiroshi Hatake <[email protected]> --------- Signed-off-by: Hiroshi Hatake <[email protected]> Co-authored-by: Alexa Kreizinger <[email protected]> Signed-off-by: Tom <[email protected]>
1 parent 9393bf7 commit 3e70f4c

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

pipeline/inputs/tail.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ The plugin supports the following configuration parameters:
3737
| `static_batch_size` | Set the maximum number of bytes to process per iteration for the monitored static files (files that already exist upon Fluent Bit start). | `50M` |
3838
| `file_cache_advise` | Set the `posix_fadvise` in `POSIX_FADV_DONTNEED` mode. This reduces the usage of the kernel file cache. This option is ignored if not running on Linux. | `on` |
3939
| `threaded` | Indicates whether to run this input in its own [thread](../../administration/multithreading.md#inputs). | `false` |
40+
| `Unicode.Encoding` | Set the Unicode character encoding of the file data. This parameter requests two-byte aligned chunk and buffer sizes. If data is not aligned for two bytes, Fluent Bit will use two-byte alignment automatically to avoid character breakages on consuming boundaries. Supported values: `UTF-16LE`, `UTF-16BE`, and `auto`. | `none` |
4041

4142
## Buffers and memory management
4243

@@ -77,6 +78,12 @@ If no database file is present, positioning behavior depends on the value of `re
7778

7879
The database file essentially stores `inode=offset` so it should be unique per instance of the plugin, for example if you have two tail inputs then use two separate `db` files for each. That way each tail input can independently track its own state.
7980

81+
{% hint style="info" %}
82+
The `Unicode.Encoding` parameter is dependent on the simdutf library, which is itself dependent on C++ version 11 or later. In environments that use earlier versions of C++, the `Unicode.Encoding` parameter will fail.
83+
84+
Additionally, the `auto` setting for `Unicode.Encoding` isn't supported in all cases, and can make mistakes when it tries to guess the correct encoding. For best results, use either the `UTF-16LE` or `UTF-16BE` setting if you know the encoding type of the target file.
85+
{% endhint %}
86+
8087
## Monitor a large number of files
8188

8289
To monitor a large number of files, you can increase the `inotify` settings in your Linux environment by modifying the following `sysctl` parameters:

0 commit comments

Comments
 (0)