Skip to content

Fluent-bit crashes when handling envoy empty spans #9391

@carlos4ndre

Description

@carlos4ndre

Bug Report

Describe the bug

I’m having an issue using fluent-bit (3.1.7) when processing spans sent by Envoy.

The flow is like this: Envoy -> fluent-bit -> OTel collector

From what I can see in the logs, Envoy is sending some periodic data (every 10s) but with empty spans, probably some sort of health check and causes fluent-bit to crash.

Not sure if this is a well known issue. Envoy is configured to send traces over gRPC.

To Reproduce

  1. Create config files for fluent-bit, envoy and OTel collector services

fluent-bit.conf

[SERVICE]
  Flush        1
  Log_Level    debug
  Daemon       off

[INPUT]
  Name       opentelemetry
  Listen     0.0.0.0
  Port       4318
  Tag        otel

#[INPUT]
#  Name  event_type
#  Type  traces
#  Tag   otel

[OUTPUT]
  Name  stdout
  Match otel

[OUTPUT]
  Name       opentelemetry
  Match      otel
  Host       otel-collector
  Port       4318

otel-collector.yaml

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318

exporters:
  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: []
      exporters: [logging]

envoy.yaml

admin:
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901

static_resources:
  listeners:
  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 10000
    traffic_direction: OUTBOUND
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          tracing:
            spawn_upstream_span: true
            verbose: false
            provider:
              name: envoy.tracers.opentelemetry
              typed_config:
                "@type": type.googleapis.com/envoy.config.trace.v3.OpenTelemetryConfig
                grpc_service:
                  envoy_grpc:
                    cluster_name: fluentbit_agent
                  timeout: 2s
                service_name: front-envoy
            client_sampling:
              value: 100
            random_sampling:
              value: 100
            overall_sampling:
              value: 100
          codec_type: AUTO
          stat_prefix: ingress_http
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          route_config:
            name: proxy_routes
            virtual_hosts:
            - name: proxy
              domains:
              - "*"
              routes:
              - match:
                  prefix: "/echo"
                direct_response:
                  status: 200
                  body:
                    inline_string: "OK"
          access_log:
          - name: envoy.access_loggers.file
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
              path: "/dev/stdout"
              log_format:
                json_format:
                  traceparent: "%REQ(TRACEPARENT)%"
                  tracestate: "%REQ(TRACESTATE)%"
  clusters:
  - name: fluentbit_agent
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    typed_extension_protocol_options:
      envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
        "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
        explicit_http_config:
          http2_protocol_options: {}
    load_assignment:
      cluster_name: fluentbit_agent
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: fluent-bit
                port_value: 4318
  1. Create docker-compose.yaml file
services:
  envoy:
    image: envoyproxy/envoy:distroless-v1.31.1
    volumes:
    - ./envoy.yaml:/etc/envoy/envoy.yaml
    ports:
    - "10000:10000"
    - "9901:9901"

  fluent-bit:
    image: fluent/fluent-bit:3.1.7
    container_name: fluent-bit
    volumes:
    - ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
    ports:
    - "4318:4318"

  otel-collector:
    image: otel/opentelemetry-collector:0.109.0
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
    - ./otel-collector.yaml:/etc/otel-collector-config.yaml
  1. Run the following command to bring everything up:
~/demo $ tree
.
├── docker-compose.yaml
├── envoy.yaml
├── fluent-bit.conf
└── otel-collector.yaml

0 directories, 4 files


$ docker compose up --build -d
[+] Running 3/3
 ✔ Container demo-otel-collector-1  Started                                                                                                            0.2s
 ✔ Container demo-envoy-1           Started                                                                                                            0.2s
 ✔ Container fluent-bit             Started
  1. Fluent-bit crashes when receiving payload sent by Envoy
$ docker ps -a
CONTAINER ID   IMAGE                                  COMMAND                  CREATED              STATUS                            PORTS                                                                   NAMES
c61911c888d6   otel/opentelemetry-collector:0.109.0   "/otelcol --config=/…"   About a minute ago   Up About a minute                 4317/tcp, 0.0.0.0:8888->8888/tcp, 55678/tcp, 0.0.0.0:55679->55679/tcp   demo-otel-collector-1
5154ff9bc29b   envoyproxy/envoy:distroless-v1.31.1    "/usr/local/bin/envo…"   About a minute ago   Up About a minute                 0.0.0.0:9901->9901/tcp, 0.0.0.0:10000->10000/tcp                        demo-envoy-1
b82479cb74c9   fluent/fluent-bit:3.1.7                "/fluent-bit/bin/flu…"   About a minute ago   Exited (133) About a minute ago                                                                           fluent-bit

$ docker logs b82479cb74c9
Fluent Bit v3.1.7
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  __
|  ___| |                | |   | ___ (_) |         |____ |/  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`| |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \ | |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /_| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)___/

[2024/09/16 18:37:12] [ info] Configuration:
[2024/09/16 18:37:12] [ info]  flush time     | 1.000000 seconds
[2024/09/16 18:37:12] [ info]  grace          | 5 seconds
[2024/09/16 18:37:12] [ info]  daemon         | 0
[2024/09/16 18:37:12] [ info] ___________
[2024/09/16 18:37:12] [ info]  inputs:
[2024/09/16 18:37:12] [ info]      opentelemetry
[2024/09/16 18:37:12] [ info] ___________
[2024/09/16 18:37:12] [ info]  filters:
[2024/09/16 18:37:12] [ info] ___________
[2024/09/16 18:37:12] [ info]  outputs:
[2024/09/16 18:37:12] [ info]      stdout.0
[2024/09/16 18:37:12] [ info]      opentelemetry.1
[2024/09/16 18:37:12] [ info] ___________
[2024/09/16 18:37:12] [ info]  collectors:
[2024/09/16 18:37:12] [ info] [fluent bit] version=3.1.7, commit=c6e902a43a, pid=1
[2024/09/16 18:37:12] [debug] [engine] coroutine stack size: 196608 bytes (192.0K)
[2024/09/16 18:37:12] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/09/16 18:37:12] [ info] [cmetrics] version=0.9.5
[2024/09/16 18:37:12] [ info] [ctraces ] version=0.5.5
[2024/09/16 18:37:12] [ info] [input:opentelemetry:opentelemetry.0] initializing
[2024/09/16 18:37:12] [ info] [input:opentelemetry:opentelemetry.0] storage_strategy='memory' (memory only)
[2024/09/16 18:37:12] [debug] [opentelemetry:opentelemetry.0] created event channels: read=21 write=22
[2024/09/16 18:37:12] [debug] [downstream] listening on 0.0.0.0:4318
[2024/09/16 18:37:12] [ info] [input:opentelemetry:opentelemetry.0] listening on 0.0.0.0:4318
[2024/09/16 18:37:12] [debug] [stdout:stdout.0] created event channels: read=24 write=25
[2024/09/16 18:37:12] [debug] [opentelemetry:opentelemetry.1] created event channels: read=31 write=32
[2024/09/16 18:37:12] [debug] [router] match rule opentelemetry.0:stdout.0
[2024/09/16 18:37:12] [debug] [router] match rule opentelemetry.0:opentelemetry.1
[2024/09/16 18:37:12] [ info] [sp] stream processor started
[2024/09/16 18:37:12] [ info] [output:stdout:stdout.0] worker #0 started
[2024/09/16 18:37:17] [debug] [task] created task=0xffffa68396e0 id=0 OK
[2024/09/16 18:37:17] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[2024/09/16 18:37:17] [engine] caught signal (SIGSEGV)
...
|-------------------- RESOURCE SPAN --------------------|
  resource:
     - attributes:
            - service.name: 'front-envoy'
     - dropped_attributes_count: 0
  schema_url:
  [scope_span]
    schema_url:
    [spans]
[2024/09/16 18:37:17] [debug] [output:opentelemetry:opentelemetry.1] ctraces msgpack size: 1562
[2024/09/16 18:37:17] [debug] [output:stdout:stdout.0] ctr decode msgpack returned : 6
[2024/09/16 18:37:17] [debug] [out flush] cb_destroy coro_id=0
#0  0xaaaac47bcc03      in  process_traces() at plugins/out_opentelemetry/opentelemetry.c:389
#1  0xaaaac47bcc03      in  cb_opentelemetry_flush() at plugins/out_opentelemetry/opentelemetry.c:485
#2  0xaaaac4b765a7      in  co_switch() at lib/monkey/deps/flb_libco/aarch64.c:133
#3  0xffffffffffffffff  in  ???() at ???:0
  1. Now use the event_type input to show that opentelemetry output works fine with normal span payloads:
#[INPUT]
#  Name       opentelemetry
#  Listen     0.0.0.0
#  Port       4318
#  Tag        otel

[INPUT]
  Name  event_type
  Type  traces
  Tag   otel
...
  1. Send a request to generate a span
$ curl http://localhost:10000/echo
OK
  1. Check that the span is correctly forward from fluent-bit to otel-collector by looking into their logs
$ docker ps
$ docker logs <container_id>
...
|-------------------- RESOURCE SPAN --------------------|
  resource:
     - attributes:
            - service.name: 'Fluent Bit Test Service'
     - dropped_attributes_count: 5
  schema_url: https://ctraces/resource_span_schema_url
  [scope_span]
    instrumentation scope:
        - name                    : ctrace
        - version                 : a.b.c
        - dropped_attributes_count: 3
        - attributes: undefined
    schema_url: https://ctraces/scope_span_schema_url
    [spans]
         [span 'main']
             - trace_id                : 526274528d3beab4f98a82f459bdc77c
             - span_id                 : 1442ef1b690cd8b0
             - parent_span_id          : undefined
             - kind                    : 1 (internal)
             - start_time              : 1726512123955380387
             - end_time                : 1726512123955380387
             - dropped_attributes_count: 0
             - dropped_events_count    : 0
             - status:
                 - code        : 0
             - attributes:
                 - agent: 'Fluent Bit'
                 - year: 2022
                 - open_source: true
                 - temperature: 25.5
                 - my_array: [
                     'first',
                     2,
                     false,
                     [
                         3.1000000000000001,
                         5.2000000000000002,
                         6.2999999999999998
                     ]
                 ]
                 - my-list:
                     - language: 'c'

             - events:
                 - name: connect to remote server
                     - timestamp               : 1726512123955392720
                     - dropped_attributes_count: 0
                     - attributes:
                         - syscall 1: 'open()'
                         - syscall 2: 'connect()'
                         - syscall 3: 'write()'
             - [links]
         [span 'do-work']
             - trace_id                : 526274528d3beab4f98a82f459bdc77c
             - span_id                 : a58aca2253f4b054
             - parent_span_id          : 1442ef1b690cd8b0
             - kind                    : 3 (client)
             - start_time              : 1726512123955397137
             - end_time                : 1726512123955397137
             - dropped_attributes_count: 0
             - dropped_events_count    : 0
             - status:
                 - code        : 0
             - attributes: none
             - events: none
             - [links]
                 - link:
                     - trace_id             : 41354bad670f86e7a9fe6077a7ae3a4c
                     - span_id              : 820d8bab3a51c548
                     - trace_state          : aaabbbccc
                     - dropped_events_count : 2
                     - attributes           : none
...

Expected behaviour

Fluent-bit does not crash when receiving empty spans from Envoy, or allow a way to filter them.

Your Environment

  • Version used:
    • fluent-bit: 3.1.7
    • envoy: v1.31.1
    • otel-collector: 0.109.0
  • Environment name and version: Docker (27.0.3)
  • Operating System and version: MacOS (14.6.1)
  • Filters and plugins: No additional

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions