Skip to content
36 changes: 34 additions & 2 deletions pipeline/parsers/configuring-parser.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,9 @@ By default, Fluent Bit provides a set of pre-configured parsers that can be used
Parsers are defined in one or more configuration files that are loaded at start time, either from the command line or through the main Fluent Bit configuration file.

{% hint style="info" %}

Fluent Bit uses Ruby-based regular expressions. You can use [Rubular](http://www.rubular.com) to test your regular expressions for Ruby compatibility.

{% endhint %}

## Configuration parameters
Expand All @@ -43,7 +45,30 @@ Multiple parsers can be defined and each section has it own properties. The foll

## Parsers configuration file

All parsers must be defined in a `parsers.conf` file, not in the Fluent Bit global configuration file. The parsers file exposes all parsers available that can be used by the input plugins that are aware of this feature. A parsers file can have multiple entries, like so:
All parsers must be defined in a parsers file (see below for examples), not in the Fluent Bit global configuration file. The parsers file exposes all parsers available that can be used by the input plugins that are aware of this feature. A parsers file can have multiple entries, like so:

{% tabs %}
{% tab title="parsers.yaml" %}

```yaml
parsers:
- name: docker
format: json
time_key: time
time_format: '%Y-%m-%dT%H:%M:%S.%L'
time_keep: on

- name: syslog-rfc5424
format: regex
regex: '^\<(?<pri>[0-9]{1,5})\>1 (?<time>[^ ]+) (?<host>[^ ]+) (?<ident>[^ ]+) (?<pid>[-0-9]+) (?<msgid>[^ ]+) (?<extradata>(\[(.*)\]|-)) (?<message>.+)$'
time_key: time
time_format: '%Y-%m-%dT%H:%M:%S.%L'
time_keep: on
types: pid:integer
```

{% endtab %}
{% tab title="parsers.conf" %}

```text
[PARSER]
Expand All @@ -63,6 +88,9 @@ All parsers must be defined in a `parsers.conf` file, not in the Fluent Bit glob
Types pid:integer
```

{% endtab %}
{% endtabs %}

For more information about the parsers available, refer to the [default parsers file](https://github.com/fluent/fluent-bit/blob/master/conf/parsers.conf) distributed with Fluent Bit source code.

## Time resolution and fractional seconds
Expand All @@ -72,7 +100,9 @@ Time resolution and its format supported are handled by using the [strftime\(3\)
In addition, Fluent Bit extends its time resolution to support fractional seconds like `017-05-17T15:44:31**.187512963**Z`. The `%L` format option for `Time_Format` is provided as a way to indicate that content must be interpreted as fractional seconds.

{% hint style="info" %}

The option `%L` is only valid when used after seconds (`%S`) or seconds since the epoch (`%s`). For example, `%S.%L` and `%s.%L` are valid strings.

{% endhint %}

## Supported time zone abbreviations
Expand Down Expand Up @@ -172,7 +202,9 @@ The following time zone abbreviations are supported.
### Military time zones

{% hint style="info" %}

These are single-letter UTC offset designators. `J` (Juliet) represents local time and is not included. `Z` represents Zulu Time, as listed in the [Universal time zones](#universal-time-zones) list.

{% endhint %}

| Abbreviation | UTC Offset (`HH:MM`) | Offset (seconds) | Is DST | Description |
Expand Down Expand Up @@ -200,4 +232,4 @@ These are single-letter UTC offset designators. `J` (Juliet) represents local ti
| `V` | `-09:00` | `-32400` | no | Victor Time Zone |
| `W` | `-10:00` | `-36000` | no | Whiskey Time Zone |
| `X` | `-11:00` | `-43200` | no | X-ray Time Zone |
| `Y` | `-12:00` | `-46800` | no | Yankee Time Zone |
| `Y` | `-12:00` | `-46800` | no | Yankee Time Zone |
66 changes: 63 additions & 3 deletions pipeline/parsers/decoders.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,24 @@ definition can optionally set one or more decoders. There are two types of decod

Our pre-defined Docker parser has the following definition:

{% tabs %}
{% tab title="parsers.yaml" %}

```yaml
parsers:
- name: docker
format: json
time_key: time
time_format: '%Y-%m-%dT%H:%M:%S.%L'
time_keep: on
# Command | Decoder | Field | Optional Action |
# ==========|==========|=======|===================|
decode_field_as: escaped log
```

{% endtab %}
{% tab title="parsers.conf" %}

```text
[PARSER]
Name docker
Expand Down Expand Up @@ -95,11 +113,32 @@ Example output:
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845622Z"}]
```

Decoder configuration file:
Decoder example Fluent Bit configuration files:

{% tabs %}
{% tab title="fluent-bit.yaml" %}

```yaml
service:
parsers_file: parsers.yaml

pipeline:
inputs:
- name: tail
parser: docker
path: /path/to/log.log

outputs:
- name: stdout
match: '*'
```

{% endtab %}
{% tab title="fluent-bit.conf" %}

```text
[SERVICE]
Parsers_File fluent-bit-parsers.conf
Parsers_File parsers.conf

[INPUT]
Name tail
Expand All @@ -111,7 +150,25 @@ Decoder configuration file:
Match *
```

The `fluent-bit-parsers.conf` file:
{% endtab %}
{% endtabs %}

The example parsers file:

{% tabs %}
{% tab title="parsers.yaml" %}

```yaml
parsers:
- name: docker
format: json
time_key: time
time_format: '%Y-%m-%dT%H:%M:%S %z'
decode_field_as: escaped_utf8 log
```

{% endtab %}
{% tab title="parsers.conf" %}

```text
[PARSER]
Expand All @@ -121,3 +178,6 @@ The `fluent-bit-parsers.conf` file:
Time_Format %Y-%m-%dT%H:%M:%S %z
Decode_Field_as escaped_utf8 log
```

{% endtab %}
{% endtabs %}
23 changes: 20 additions & 3 deletions pipeline/parsers/json.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,34 @@ The _JSON_ parser transforms JSON logs by converting them to internal binary rep

For example, the default parsers configuration file includes a parser for parsing Docker logs (when the Tail input plugin is used):

```python
{% tabs %}
{% tab title="parsers.yaml" %}

```yaml
parsers:
- name: docker
format: json
time_key: time
time_format: '%Y-%m-%dT%H:%M:%S %z'
```

{% endtab %}
{% tab title="parsers.conf" %}

```text
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S %z
```

{% endtab %}
{% endtabs %}

The following log entry is valid content for the previously defined parser:

```javascript
```text
{"key1": 12345, "key2": "abc", "time": "2006-07-28T13:22:04Z"}
```

Expand All @@ -24,4 +41,4 @@ After processing, its internal representation will be:
[1154103724, {"key1"=>12345, "key2"=>"abc"}]
```

The time was converted to a UTC timestamp and the map was reduced to each component of the original message.
The time was converted to a UTC timestamp and the map was reduced to each component of the original message.
37 changes: 34 additions & 3 deletions pipeline/parsers/logfmt.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,29 @@

The **logfmt** parser allows to parse the logfmt format described in [https://brandur.org/logfmt](https://brandur.org/logfmt) . A more formal description is in [https://godoc.org/github.com/kr/logfmt](https://godoc.org/github.com/kr/logfmt) .

Here is an example configuration:
Here is an example parsers configuration:

```python
{% tabs %}
{% tab title="parsers.yaml" %}

```yaml
parsers:
- name: logfmt
format: logfmt
```

{% endtab %}
{% tab title="parsers.conf" %}

```text
[PARSER]
Name logfmt
Format logfmt
```

{% endtab %}
{% endtabs %}

The following log entry is a valid content for the parser defined above:

```text
Expand All @@ -27,9 +42,25 @@ After processing, it internal representation will be:
If you want to be more strict than the logfmt standard and not parse lines where some attributes do
not have values (such as `key3`) in the example above, you can configure the parser as follows:

```python
{% tabs %}
{% tab title="parsers.yaml" %}

```yaml
parsers:
- name: logfmt
format: logfmt
logfmt_no_bare_keys: true
```

{% endtab %}
{% tab title="parsers.conf" %}

```text
[PARSER]
Name logfmt
Format logfmt
Logfmt_No_Bare_Keys true
```

{% endtab %}
{% endtabs %}
25 changes: 21 additions & 4 deletions pipeline/parsers/ltsv.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,24 @@ LogFormat "host:%h\tident:%l\tuser:%u\ttime:%t\treq:%r\tstatus:%>s\tsize:%b\tref
CustomLog "logs/access_log" combined_ltsv
```

The parser.conf:
The following is an example parsers configuration file:

{% tabs %}
{% tab title="parsers.yaml" %}

```yaml
parsers:
- name: access_log_ltsv
format: ltsv
time_key: time
time_format: '[%d/%b/%Y:%H:%M:%S %z]'
types: status:integer size:integer
```

{% endtab %}
{% tab title="parsers.conf" %}

```python
```text
[PARSER]
Name access_log_ltsv
Format ltsv
Expand All @@ -24,6 +39,9 @@ The parser.conf:
Types status:integer size:integer
```

{% endtab %}
{% endtabs %}

The following log entry is a valid content for the parser defined above:

```text
Expand All @@ -42,5 +60,4 @@ After processing, it internal representation will be:
[1531222025.000000000, {"host"=>"127.0.0.1", "ident"=>"-", "user"=>"-", "req"=>"GET /assets/css/style.css HTTP/1.1", "status"=>200, "size"=>1279, "referer"=>"http://127.0.0.1/", "ua"=>"Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"}]
```

The time has been converted to Unix timestamp \(UTC\).

The time has been converted to Unix timestamp \(UTC\).
28 changes: 24 additions & 4 deletions pipeline/parsers/regular-expression.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,11 @@ across multiple lines from a `tail`. The [Tail](../inputs/tail.md) input plugin
treats each line as a separate entity.

{% hint style="warning" %}

Security Warning: Onigmo is a backtracking regex engine. When using expensive
regex patterns Onigmo can take a long time to perform pattern matching. Read
["ReDoS"](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS)
on OWASP for additional information.
["ReDoS"](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS) on OWASP for additional information.

{% end hint %}

Setting the format to **regex** requires a `regex` configuration key.
Expand All @@ -34,7 +35,23 @@ character. Use the [Rubular](http://rubular.com/) web editor to test your expres
The following parser configuration example provides rules that can be applied to an
Apache HTTP Server log entry:

```python
{% tabs %}
{% tab title="parsers.yaml" %}

```yaml
parsers:
- name: apache
format: regex
regex: '^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$'
time_key: time
time_format: '%d/%b/%Y:%H:%M:%S %z'
types: pid:integer size:integer
```

{% endtab %}
{% tab title="parsers.conf" %}

```text
[PARSER]
Name apache
Format regex
Expand All @@ -44,6 +61,9 @@ Apache HTTP Server log entry:
Types code:integer size:integer
```

{% endtab %}
{% endtabs %}

As an example, review the following Apache HTTP Server log entry:

```text
Expand All @@ -64,4 +84,4 @@ proper parser can help to make a structured representation of the entry:
"agent"=>""
}
]
```
```