How to handle Elastic Common Schema (ECS) JSON format logs including when they are split/truncated #2163
autotitan
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
TL;DR - You need to add in 2 filters in your
FloworClusterFlowfilters.In our environment, we run Kubernetes that uses the containerd runtime. All logs are harvested by the logging operator (fluentbit/fluentd) and shipped to an Elasticsearch/ELK cluster to view in Kibana.
There were multiple problems we were encountering with harvesting JSON logs using the Elastic Common Schema (ECS) standard from our kubernetes containers.
The first problem was that the logging operator refused to parse the logs even though they are JSON and are standardized with ECS.
After some trial and error, this is the basic configuration that seems to works perfectly for parsing JSON logs using Elastic Common Schema logs:
The second problem is that it was reported a large percentage of the ECS logs were missing or getting dropped entirely. In our investigation we discovered that the very large/long logs (which were written as single JSON log lines) were getting split into multiple logs by fluentd. And since the log was now split up (and no longer a single JSON line), the logs were no longer proper JSON. Therefore the split logs were getting dropped/thrown away and a great deal of ParserError logs were occurring in the fluentd logs that look like this:
error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with dataThe fix for the log split issue was to add in a
concatfilter that re-joins the split logs. In the example below, take special note of thestream_identity_key. In our casetagwas the appropriate identifier that linked the split logs together. Check the documentation for more details.I wanted to share this because I know there are plenty of other engineers encountering the exact same issues with the making the logging operator behave appropriately with JSON logs that are in the Elastic Common Schema (ECS) standard, as well as probably trying to figure out why there would be a parsing error even though your JSON logs truly are getting written as a single log line but still getting split by the fluentd processor.
Beta Was this translation helpful? Give feedback.
All reactions