Skip to content

Commit 0f11c48

Browse files
authored
Merge pull request #1465 from fluent/lynettemiles/sc-108482/update-pipeline-parsers-regular-expression
Fluent docs: regex: Updating regex doc for style and clarity
2 parents 173c92c + 6410511 commit 0f11c48

File tree

1 file changed

+27
-21
lines changed

1 file changed

+27
-21
lines changed

pipeline/parsers/regular-expression.md

Lines changed: 27 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,38 @@
11
# Regular Expression
22

3-
The **regex** parser allows to define a custom Ruby Regular Expression that will use a named capture feature to define which content belongs to which key name.
3+
The **Regex** parser lets you define a custom Ruby regular expression that uses
4+
a named capture feature to define which content belongs to which key name.
45

5-
Fluent Bit uses [Onigmo](https://github.com/k-takata/Onigmo) regular expression library on Ruby mode, for testing purposes you can use the following web editor to test your expressions:
6+
Use [Tail Multiline](../inputs/tail.md#multiline) when you need to support regexes
7+
across multiple lines from a `tail`. The [Tail](../inputs/tail.md) input plugin
8+
treats each line as a separate entity.
69

7-
[http://rubular.com/](http://rubular.com/)
10+
{% hint style="warning" %}
11+
Security Warning: Onigmo is a backtracking regex engine. When using expensive
12+
regex patterns Onigmo can take a long time to perform pattern matching. Read
13+
["ReDoS"](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS)
14+
on OWASP for additional information.
15+
{% end hint %}
816

9-
Important: do not attempt to add multiline support in your regular expressions if you are using [Tail](../inputs/tail.md) input plugin since each line is handled as a separated entity. Instead use Tail [Multiline](../inputs/tail.md#multiline) support configuration feature.
17+
Setting the format to **regex** requires a `regex` configuration key.
1018

11-
Security Warning: Onigmo is a _backtracking_ regex engine. You need to be careful not to use expensive regex patterns, or Onigmo can take very long time to perform pattern matching. For details, please read the article ["ReDoS"](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS) on OWASP.
12-
13-
> Note: understanding how regular expressions works is out of the scope of this content.
19+
## Configuration Parameters
1420

15-
From a configuration perspective, when the format is set to **regex**, is mandatory and expected that a _Regex_ configuration key exists.
21+
The regex parser supports the following configuration parameters:
1622

17-
## Configuration Parameters
23+
| Key | Description | Default Value |
24+
| --- | ----------- | ------------- |
25+
| `Skip_Empty_Values` | If enabled, the parser ignores empty value of the record. | `True` |
1826

19-
The regex parser supports the following configuration parameters.
27+
Fluent Bit uses the [Onigmo](https://github.com/k-takata/Onigmo) regular expression
28+
library on Ruby mode.
2029

21-
|Key|Description|Default Value|
22-
|-------|------------|--------|
23-
|`Skip_Empty_Values`|If enabled, the parser ignores empty value of the record.| True|
30+
You can use only alphanumeric characters and underscore in group names. For example,
31+
a group name like `(?<user-name>.*)` causes an error due to the invalid dash (`-`)
32+
character. Use the [Rubular](http://rubular.com/) web editor to test your expressions.
2433

25-
The following parser configuration example aims to provide rules that can be applied to an Apache HTTP Server log entry:
34+
The following parser configuration example provides rules that can be applied to an
35+
Apache HTTP Server log entry:
2636

2737
```python
2838
[PARSER]
@@ -34,13 +44,14 @@ The following parser configuration example aims to provide rules that can be app
3444
Types code:integer size:integer
3545
```
3646

37-
As an example, takes the following Apache HTTP Server log entry:
47+
As an example, review the following Apache HTTP Server log entry:
3848

3949
```text
4050
192.168.2.20 - - [29/Jul/2015:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395
4151
```
4252

43-
The above content do not provide a defined structure for Fluent Bit, but enabling the proper parser we can help to make a structured representation of it:
53+
This log entry doesn't provide a defined structure for Fluent Bit. Enabling the
54+
proper parser can help to make a structured representation of the entry:
4455

4556
```text
4657
[1154104030, {"host"=>"192.168.2.20",
@@ -54,8 +65,3 @@ The above content do not provide a defined structure for Fluent Bit, but enablin
5465
}
5566
]
5667
```
57-
58-
A common pitfall is that you cannot use characters other than alphabets, numbers and underscore in group names. For example, a group name like `(?<user-name>.*)` will cause an error due to containing an invalid character \(`-`\).
59-
60-
In order to understand, learn and test regular expressions like the example above, we suggest you try the following Ruby Regular Expression Editor: [http://rubular.com/r/X7BH0M4Ivm](http://rubular.com/r/X7BH0M4Ivm)
61-

0 commit comments

Comments
 (0)