Skip to content

Commit 59b0598

Browse files
authored
Merge pull request #1457 from fluent/lynettemiles/sc-108080/update-local-testing-validating-your-data
Fluent bit: docs: Updating validating data for style
2 parents d00a9dd + fd90ca4 commit 59b0598

File tree

1 file changed

+73
-34
lines changed

1 file changed

+73
-34
lines changed

local-testing/validating-your-data-and-structure.md

Lines changed: 73 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,55 +1,90 @@
1-
# Validating your Data and Structure
1+
# Validating your data and structure
22

3-
Fluent Bit is a powerful log processing tool that can deal with different sources and formats, in addition it provides several filters that can be used to perform custom modifications. This flexibility is really good but while your pipeline grows, it's strongly recommended to validate your data and structure.
3+
Fluent Bit is a powerful log processing tool that supports mulitple sources and
4+
formats. In addition, it provides filters that can be used to perform custom
5+
modifications. As your pipeline grows, it's important to validate your data and
6+
structure.
47

5-
> We encourage Fluent Bit users to integrate data validation in their CI systems
8+
Fluent Bit users are encouraged to integrate data validation in their contininuous
9+
integration (CI) systems.
610

7-
A simplified view of our data processing pipeline is as follows:
11+
In a normal production environment, inputs, filters, and outputs are defined in the
12+
configuration. Fluent Bit provides the [Expect](../pipeline/filters/expect.md) filter,
13+
which can be used to validate `keys` and `values` from your records and take action
14+
when an exception is found.
815

9-
![](../.gitbook/assets/flb_pipeline_simplified.png)
16+
A simplified view of the data processing pipeline is as follows:
1017

11-
In a normal production environment, many Inputs, Filters, and Outputs are defined in the configuration, so integrating a continuous validation of your configuration against expected results is a must. For this requirement, Fluent Bit provides a specific Filter called **Expect** which can be used to validate expected Keys and Values from your records and takes some action when an exception is found.
18+
```mermaid
19+
flowchart LR
20+
IS[Inputs / Sources]
21+
Fil[Filters]
22+
OD[Outputs/ Destination]
23+
IS --> Fil --> OD
24+
```
1225

13-
## How it Works
26+
## Understand structure and configuration
1427

15-
As an example, consider the following pipeline where your source of data is a normal file with JSON content on it and then two filters: [grep](../pipeline/filters/grep.md) to exclude certain records and [record\_modifier](../pipeline/filters/record-modifier.md) to alter the record content adding and removing specific keys.
28+
Consider the following pipeline, where your source of data is a file with JSON
29+
content and two filters:
1630

17-
![](../.gitbook/assets/flb_pipeline_simplified_example_01.png)
31+
- [grep](../pipeline/filters/grep.md) to exclude certain records
32+
- [record_modifier](../pipeline/filters/record-modifier.md) to alter the record
33+
content by adding and removing specific keys.
1834

19-
Ideally you want to add checkpoints of validation of your data between each step so you can know if your data structure is correct, we do this by using **expect** filter.
35+
```mermaid
36+
flowchart LR
37+
tail["tail (input)"]
38+
grep["grep (filter)"]
39+
record["record_modifier (filter)"]
40+
stdout["stdout (output)"]
2041
21-
![](../.gitbook/assets/flb_pipeline_simplified_expect.png)
42+
tail --> grep
43+
grep --> record
44+
record --> stdout
45+
```
2246

23-
Expect filter sets rules that aims to validate certain criteria like:
47+
Add data validation between each step to ensure your data structure is correct.
48+
49+
This example uses the `expect` filter.
50+
51+
```mermaid
52+
flowchart LR
53+
tail["tail (input)"]
54+
grep["grep (filter)"]
55+
record["record_modifier (filter)"]
56+
stdout["stdout (output)"]
57+
E1["expect (filter)"]
58+
E2["expect (filter)"]
59+
E3["expect (filter)"]
60+
tail --> E1 --> grep
61+
grep --> E2 --> record --> E3 --> stdout
62+
```
2463

25-
* does the record contain a key A ?
26-
* does the record not contains key A?
27-
* does the record key A value equals NULL ?
28-
* does the record key A value a different value than NULL ?
29-
* does the record key A value equals B ?
64+
`Expect` filters set rules aiming to validate criteria like:
3065

31-
Every expect filter configuration can expose specific rules to validate the content of your records, it supports the following configuration properties:
66+
- Does the record contain a key `A`?
67+
- Does the record not contain key `A`?
68+
- Does the record key `A` value equal `NULL`?
69+
- Is the record key `A` value not `NULL`?
70+
- Does the record key `A` value equal `B`?
3271

33-
| Property | Description |
34-
| :--- | :--- |
35-
| key\_exists | Check if a key with a given name exists in the record. |
36-
| key\_not\_exists | Check if a key does not exist in the record. |
37-
| key\_val\_is\_null | check that the value of the key is NULL. |
38-
| key\_val\_is\_not\_null | check that the value of the key is NOT NULL. |
39-
| key\_val\_eq | check that the value of the key equals the given value in the configuration. |
40-
| action | action to take when a rule does not match. The available options are `warn` or `exit`. On `warn`, a warning message is sent to the logging layer when a mismatch of the rules above is found; using `exit` makes Fluent Bit abort with status code `255`. |
72+
Every `expect` filter configuration exposes rules to validate the content of your
73+
records using [configuration properties](../pipeline/filters/expect.md#configuration-parameters).
4174

42-
## Start Testing
75+
## Test the configuration
4376

44-
Consider the following JSON file called `data.log` with the following content:
77+
Consider a JSON file `data.log` with the following content:
4578

4679
```javascript
4780
{"color": "blue", "label": {"name": null}}
4881
{"color": "red", "label": {"name": "abc"}, "meta": "data"}
4982
{"color": "green", "label": {"name": "abc"}, "meta": null}
5083
```
5184

52-
The following Fluent Bit configuration file will configure a pipeline to consume the log above apply an expect filter to validate that keys `color` and `label` exists:
85+
The following Fluent Bit configuration file configures a pipeline to consume the
86+
log, while applying an `expect` filter to validate that the keys `color` and `label`
87+
exist:
5388

5489
```python
5590
[SERVICE]
@@ -76,9 +111,12 @@ The following Fluent Bit configuration file will configure a pipeline to consume
76111
match *
77112
```
78113

79-
note that if for some reason the JSON parser failed or is missing in the `tail` input \(line 9\), the `expect` filter will trigger the `exit` action. As a test, go ahead and comment out or remove line 9.
114+
If the JSON parser fails or is missing in the `tail` input
115+
(`parser json`), the `expect` filter triggers the `exit` action.
80116

81-
As a second step, we will extend our pipeline and we will add a grep filter to match records that map `label` contains a key called `name` with value `abc`, then an expect filter to re-validate that condition:
117+
To extend the pipeline, add a grep filter to match records that map `label`
118+
containing a key called `name` with value the `abc`, and add an `expect` filter
119+
to re-validate that condition:
82120

83121
```python
84122
[SERVICE]
@@ -131,7 +169,8 @@ As a second step, we will extend our pipeline and we will add a grep filter to m
131169
match *
132170
```
133171

134-
## Deploying in Production
135-
136-
When deploying your configuration in production, you might want to remove the expect filters from your configuration since it's an unnecessary _extra work_ unless you want to have a 100% coverage of checks at runtime.
172+
## Production deployment
137173

174+
When deploying in production, consider removing the `expect` filters from your
175+
configuration. These filters are unneccesary unless you need 100% coverage of
176+
checks at runtime.

0 commit comments

Comments
 (0)