Skip to content

Missing container_name, container_id and source after adding multiline parser. #959

@karoskor

Description

@karoskor

Hi team, we are using AWS firelens container to process our logs and send them to CloudWatch and Kinesis output. We have custom logger, with more lines per log entry thus we need a multiline parser to catch all of the content of those messages. After adding the multiline parser, even though I get the correct log parsing, we are losing metadata parameters - container_name, container_id and source.

Without multiline parsing (only simple filtering, which we used in previous version) the parameters are correctly added. Only after adding multiline parser the metadata parameters are lost.

We are using aws-for-fluent-bit:stable version to create the dockerfile. We are using ECS and we set enable-ecs-log-metadata to true in container definition. There is no relevant error in Fluentbit logs.

I am adding the snippets of our config files with relevant information.

Multiline parser:

[MULTILINE_PARSER]
    name          multiline-regex-test
    type          regex
    flush_timeout 1000
    #
    # rules |   state name  | regex pattern                  | next state
    # ------|---------------|--------------------------------------------
    rule      "start_state"   "^\d{4}-\d{2}-\d{2}"  "cont"
    rule      "cont"   "^(?!\d{4}-\d{2}-\d{2})"  "cont"

and our fluentbit.conf file with tcp input:

[SERVICE]
    HTTP_Server  On
    HTTP_Listen  0.0.0.0
    HTTP_PORT    2020
    Health_Check On
    HC_Errors_Count 5
    HC_Retry_Failure_Count 5
    HC_Period 5
    Flush        1
    Grace        30
    Log_Level    debug
    Parsers_File /config/parsers.conf

# TCP Input configuration
[INPUT]
    Name          tcp
    Tag           ApplicationLogs
    Listen        0.0.0.0
    Port          5170
    Format        none

[FILTER]
    Name          multiline
    Match         *
    multiline.parser    multiline-regex-test
    multiline.key_content  log

Example of the log:

Example of the parsed log with missing ecs metadata parameters:

{
    "thread": "xxx",
    "level": "ERROR",
    "destination": "xxx",
    "logger": "xxx",
    "error_code": "xxx",
    "message": "xxx",
    "action": "xxx\n        xxx\n        xxx\n",
    "ec2_instance_id": "xxx",
    "ecs_cluster": "xxx",
    "ecs_task_arn": "xxx",
    "ecs_task_definition": "xxx"
}

I tried out different regex patterns in multiline parser, but always some of the rows are missing metadata and only those 3 mentioned. Other metadata parameters like ecs_cluster are always there, so I do think it is a bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions