You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in_tail: Add descriptions for encoding parameters on in tail (#1870)
* in_tail: Add a description and note for Unicode.Encoding parameter
Signed-off-by: Hiroshi Hatake <[email protected]>
* Update pipeline/inputs/tail.md
Co-authored-by: Alexa Kreizinger <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
* in_tail: Add generic.encoding parameter descriptions
Also I added the reason why we need to support these parameters and how
to use them.
Signed-off-by: Hiroshi Hatake <[email protected]>
* Suppress lint warnings
Signed-off-by: Hiroshi Hatake <[email protected]>
* Apply suggestions from code review
This should correct the severe vale errors and most of the suggestions, as well as matching current style.
Signed-off-by: Lynette Miles <[email protected]>
---------
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Lynette Miles <[email protected]>
Co-authored-by: Alexa Kreizinger <[email protected]>
Co-authored-by: Lynette Miles <[email protected]>
Copy file name to clipboardExpand all lines: pipeline/inputs/tail.md
+94Lines changed: 94 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,6 +39,7 @@ The plugin supports the following configuration parameters:
39
39
|`file_cache_advise`| Set the `posix_fadvise` in `POSIX_FADV_DONTNEED` mode. This reduces the usage of the kernel file cache. This option is ignored if not running on Linux. |`on`|
40
40
|`threaded`| Indicates whether to run this input in its own [thread](../../administration/multithreading.md#inputs). |`false`|
41
41
|`Unicode.Encoding`| Set the Unicode character encoding of the file data. This parameter requests two-byte aligned chunk and buffer sizes. If data is not aligned for two bytes, Fluent Bit will use two-byte alignment automatically to avoid character breakages on consuming boundaries. Supported values: `UTF-16LE`, `UTF-16BE`, and `auto`. |`none`|
42
+
|`Generic.Encoding`| Set the non-Unicode encoding of the file data. Supported values: `ShiftJIS`, `UHC`, `GBK`, `GB18030`, `Big5`, `Win866`, `Win874`, `Win1250`, `Win1251`, `Win1252`, `Win2513`, `Win1254`, `Win1255`, and `Win1256`. |`none`|
42
43
43
44
## Buffers and memory management
44
45
@@ -85,6 +86,13 @@ The `Unicode.Encoding` parameter is dependent on the simdutf library, which is i
85
86
Additionally, the `auto` setting for `Unicode.Encoding` isn't supported in all cases, and can make mistakes when it tries to guess the correct encoding. For best results, use either the `UTF-16LE` or `UTF-16BE` setting if you know the encoding type of the target file.
86
87
{% endhint %}
87
88
89
+
{% hint style="info" %}
90
+
The `Unicode.Encoding` parameter is dependent on the `simdutf` library, which is itself dependent on C++ version 11 or later. In environments that use earlier versions of C++, the `Unicode.Encoding` parameter will fail.
91
+
92
+
Additionally, the `auto` setting for `Unicode.Encoding` isn't supported in all cases, and can make mistakes when it tries to guess the correct encoding. For best results, use either the `UTF-16LE` or `UTF-16BE` setting if you know the encoding type of the target file.
93
+
{% endhint %}
94
+
95
+
88
96
## Monitor a large number of files
89
97
90
98
To monitor a large number of files, you can increase the `inotify` settings in your Linux environment by modifying the following `sysctl` parameters:
@@ -465,3 +473,89 @@ While file rotation is handled, there are risks of potential log loss when using
465
473
- Final note: the `Path` patterns can't match the rotated files. Otherwise, the rotated file would be read again and lead to duplicate records.
466
474
467
475
{% endhint %}
476
+
477
+
## Character encoding conversion
478
+
479
+
This feature allows Fluent Bit to convert logs from various character encodings into the standard UTF-8 format.
480
+
This is crucial for processing logs from systems, especially Windows, that use legacy or non-UTF-8 encodings.
481
+
Proper conversion ensures that your log data is correctly parsed, indexed, and searchable.
482
+
483
+
### When to use this feature
484
+
485
+
You should use this feature if your log files or messages aren't in UTF-8 and you are seeing garbled or incorrectly rendered characters.
486
+
This is common in environments that use:
487
+
488
+
- Modern Windows applications that log in UTF-16.
489
+
490
+
- Legacy Windows systems with applications that use traditional code pages (for example, ShiftJIS, GBK, Win1252).
491
+
492
+
### Configuration parameters
493
+
494
+
To enable encoding conversion, you will use one of the following two parameters within an input plugin configuration.
495
+
496
+
1.`Unicode.Encoding`
497
+
498
+
Use this parameter for high-performance conversion of UTF-16 encoded logs to UTF-8. This method utilizes modern processor features (SIMD instructions) to accelerate the conversion process, making it highly efficient.
499
+
500
+
- Use Case: Ideal for logs coming from modern Windows environments that default to UTF-16.
501
+
- Supported Values:
502
+
- UTF-16LE (Little-Endian)
503
+
- UTF-16BE (Big-Endian)
504
+
505
+
1.`Generic.Encoding`
506
+
507
+
Use this parameter to convert from a wide variety of other character encodings, particularly legacy Windows code pages.
508
+
509
+
- Use Case: Essential for logs from older systems or applications configured for specific regions, common in East Asia and Eastern Europe.
510
+
- Supported Values: You can use any of the names or aliases listed below.
0 commit comments