-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Description
Bug Report
Describe the bug
I’m having an issue using fluent-bit (3.1.7) when processing spans sent by Envoy.
The flow is like this: Envoy -> fluent-bit -> OTel collector
From what I can see in the logs, Envoy is sending some periodic data (every 10s) but with empty spans, probably some sort of health check and causes fluent-bit to crash.
Not sure if this is a well known issue. Envoy is configured to send traces over gRPC.
To Reproduce
- Create config files for fluent-bit, envoy and OTel collector services
fluent-bit.conf
[SERVICE]
Flush 1
Log_Level debug
Daemon off
[INPUT]
Name opentelemetry
Listen 0.0.0.0
Port 4318
Tag otel
#[INPUT]
# Name event_type
# Type traces
# Tag otel
[OUTPUT]
Name stdout
Match otel
[OUTPUT]
Name opentelemetry
Match otel
Host otel-collector
Port 4318
otel-collector.yaml
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
exporters:
logging:
loglevel: debug
service:
pipelines:
traces:
receivers: [otlp]
processors: []
exporters: [logging]
envoy.yaml
admin:
address:
socket_address:
address: 0.0.0.0
port_value: 9901
static_resources:
listeners:
- address:
socket_address:
address: 0.0.0.0
port_value: 10000
traffic_direction: OUTBOUND
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
tracing:
spawn_upstream_span: true
verbose: false
provider:
name: envoy.tracers.opentelemetry
typed_config:
"@type": type.googleapis.com/envoy.config.trace.v3.OpenTelemetryConfig
grpc_service:
envoy_grpc:
cluster_name: fluentbit_agent
timeout: 2s
service_name: front-envoy
client_sampling:
value: 100
random_sampling:
value: 100
overall_sampling:
value: 100
codec_type: AUTO
stat_prefix: ingress_http
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
route_config:
name: proxy_routes
virtual_hosts:
- name: proxy
domains:
- "*"
routes:
- match:
prefix: "/echo"
direct_response:
status: 200
body:
inline_string: "OK"
access_log:
- name: envoy.access_loggers.file
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: "/dev/stdout"
log_format:
json_format:
traceparent: "%REQ(TRACEPARENT)%"
tracestate: "%REQ(TRACESTATE)%"
clusters:
- name: fluentbit_agent
type: STRICT_DNS
lb_policy: ROUND_ROBIN
typed_extension_protocol_options:
envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
"@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
explicit_http_config:
http2_protocol_options: {}
load_assignment:
cluster_name: fluentbit_agent
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: fluent-bit
port_value: 4318
- Create docker-compose.yaml file
services:
envoy:
image: envoyproxy/envoy:distroless-v1.31.1
volumes:
- ./envoy.yaml:/etc/envoy/envoy.yaml
ports:
- "10000:10000"
- "9901:9901"
fluent-bit:
image: fluent/fluent-bit:3.1.7
container_name: fluent-bit
volumes:
- ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
ports:
- "4318:4318"
otel-collector:
image: otel/opentelemetry-collector:0.109.0
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector.yaml:/etc/otel-collector-config.yaml
- Run the following command to bring everything up:
~/demo $ tree
.
├── docker-compose.yaml
├── envoy.yaml
├── fluent-bit.conf
└── otel-collector.yaml
0 directories, 4 files
$ docker compose up --build -d
[+] Running 3/3
✔ Container demo-otel-collector-1 Started 0.2s
✔ Container demo-envoy-1 Started 0.2s
✔ Container fluent-bit Started
- Fluent-bit crashes when receiving payload sent by Envoy
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c61911c888d6 otel/opentelemetry-collector:0.109.0 "/otelcol --config=/…" About a minute ago Up About a minute 4317/tcp, 0.0.0.0:8888->8888/tcp, 55678/tcp, 0.0.0.0:55679->55679/tcp demo-otel-collector-1
5154ff9bc29b envoyproxy/envoy:distroless-v1.31.1 "/usr/local/bin/envo…" About a minute ago Up About a minute 0.0.0.0:9901->9901/tcp, 0.0.0.0:10000->10000/tcp demo-envoy-1
b82479cb74c9 fluent/fluent-bit:3.1.7 "/fluent-bit/bin/flu…" About a minute ago Exited (133) About a minute ago fluent-bit
$ docker logs b82479cb74c9
Fluent Bit v3.1.7
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
______ _ _ ______ _ _ _____ __
| ___| | | | | ___ (_) | |____ |/ |
| |_ | |_ _ ___ _ __ | |_ | |_/ /_| |_ __ __ / /`| |
| _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / \ \ | |
| | | | |_| | __/ | | | |_ | |_/ / | |_ \ V /.___/ /_| |_
\_| |_|\__,_|\___|_| |_|\__| \____/|_|\__| \_/ \____(_)___/
[2024/09/16 18:37:12] [ info] Configuration:
[2024/09/16 18:37:12] [ info] flush time | 1.000000 seconds
[2024/09/16 18:37:12] [ info] grace | 5 seconds
[2024/09/16 18:37:12] [ info] daemon | 0
[2024/09/16 18:37:12] [ info] ___________
[2024/09/16 18:37:12] [ info] inputs:
[2024/09/16 18:37:12] [ info] opentelemetry
[2024/09/16 18:37:12] [ info] ___________
[2024/09/16 18:37:12] [ info] filters:
[2024/09/16 18:37:12] [ info] ___________
[2024/09/16 18:37:12] [ info] outputs:
[2024/09/16 18:37:12] [ info] stdout.0
[2024/09/16 18:37:12] [ info] opentelemetry.1
[2024/09/16 18:37:12] [ info] ___________
[2024/09/16 18:37:12] [ info] collectors:
[2024/09/16 18:37:12] [ info] [fluent bit] version=3.1.7, commit=c6e902a43a, pid=1
[2024/09/16 18:37:12] [debug] [engine] coroutine stack size: 196608 bytes (192.0K)
[2024/09/16 18:37:12] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/09/16 18:37:12] [ info] [cmetrics] version=0.9.5
[2024/09/16 18:37:12] [ info] [ctraces ] version=0.5.5
[2024/09/16 18:37:12] [ info] [input:opentelemetry:opentelemetry.0] initializing
[2024/09/16 18:37:12] [ info] [input:opentelemetry:opentelemetry.0] storage_strategy='memory' (memory only)
[2024/09/16 18:37:12] [debug] [opentelemetry:opentelemetry.0] created event channels: read=21 write=22
[2024/09/16 18:37:12] [debug] [downstream] listening on 0.0.0.0:4318
[2024/09/16 18:37:12] [ info] [input:opentelemetry:opentelemetry.0] listening on 0.0.0.0:4318
[2024/09/16 18:37:12] [debug] [stdout:stdout.0] created event channels: read=24 write=25
[2024/09/16 18:37:12] [debug] [opentelemetry:opentelemetry.1] created event channels: read=31 write=32
[2024/09/16 18:37:12] [debug] [router] match rule opentelemetry.0:stdout.0
[2024/09/16 18:37:12] [debug] [router] match rule opentelemetry.0:opentelemetry.1
[2024/09/16 18:37:12] [ info] [sp] stream processor started
[2024/09/16 18:37:12] [ info] [output:stdout:stdout.0] worker #0 started
[2024/09/16 18:37:17] [debug] [task] created task=0xffffa68396e0 id=0 OK
[2024/09/16 18:37:17] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[2024/09/16 18:37:17] [engine] caught signal (SIGSEGV)
...
|-------------------- RESOURCE SPAN --------------------|
resource:
- attributes:
- service.name: 'front-envoy'
- dropped_attributes_count: 0
schema_url:
[scope_span]
schema_url:
[spans]
[2024/09/16 18:37:17] [debug] [output:opentelemetry:opentelemetry.1] ctraces msgpack size: 1562
[2024/09/16 18:37:17] [debug] [output:stdout:stdout.0] ctr decode msgpack returned : 6
[2024/09/16 18:37:17] [debug] [out flush] cb_destroy coro_id=0
#0 0xaaaac47bcc03 in process_traces() at plugins/out_opentelemetry/opentelemetry.c:389
#1 0xaaaac47bcc03 in cb_opentelemetry_flush() at plugins/out_opentelemetry/opentelemetry.c:485
#2 0xaaaac4b765a7 in co_switch() at lib/monkey/deps/flb_libco/aarch64.c:133
#3 0xffffffffffffffff in ???() at ???:0
- Now use the
event_typeinput to show thatopentelemetryoutput works fine with normal span payloads:
#[INPUT]
# Name opentelemetry
# Listen 0.0.0.0
# Port 4318
# Tag otel
[INPUT]
Name event_type
Type traces
Tag otel
...
- Send a request to generate a span
$ curl http://localhost:10000/echo
OK
- Check that the span is correctly forward from fluent-bit to otel-collector by looking into their logs
$ docker ps
$ docker logs <container_id>
...
|-------------------- RESOURCE SPAN --------------------|
resource:
- attributes:
- service.name: 'Fluent Bit Test Service'
- dropped_attributes_count: 5
schema_url: https://ctraces/resource_span_schema_url
[scope_span]
instrumentation scope:
- name : ctrace
- version : a.b.c
- dropped_attributes_count: 3
- attributes: undefined
schema_url: https://ctraces/scope_span_schema_url
[spans]
[span 'main']
- trace_id : 526274528d3beab4f98a82f459bdc77c
- span_id : 1442ef1b690cd8b0
- parent_span_id : undefined
- kind : 1 (internal)
- start_time : 1726512123955380387
- end_time : 1726512123955380387
- dropped_attributes_count: 0
- dropped_events_count : 0
- status:
- code : 0
- attributes:
- agent: 'Fluent Bit'
- year: 2022
- open_source: true
- temperature: 25.5
- my_array: [
'first',
2,
false,
[
3.1000000000000001,
5.2000000000000002,
6.2999999999999998
]
]
- my-list:
- language: 'c'
- events:
- name: connect to remote server
- timestamp : 1726512123955392720
- dropped_attributes_count: 0
- attributes:
- syscall 1: 'open()'
- syscall 2: 'connect()'
- syscall 3: 'write()'
- [links]
[span 'do-work']
- trace_id : 526274528d3beab4f98a82f459bdc77c
- span_id : a58aca2253f4b054
- parent_span_id : 1442ef1b690cd8b0
- kind : 3 (client)
- start_time : 1726512123955397137
- end_time : 1726512123955397137
- dropped_attributes_count: 0
- dropped_events_count : 0
- status:
- code : 0
- attributes: none
- events: none
- [links]
- link:
- trace_id : 41354bad670f86e7a9fe6077a7ae3a4c
- span_id : 820d8bab3a51c548
- trace_state : aaabbbccc
- dropped_events_count : 2
- attributes : none
...
Expected behaviour
Fluent-bit does not crash when receiving empty spans from Envoy, or allow a way to filter them.
Your Environment
- Version used:
- fluent-bit: 3.1.7
- envoy: v1.31.1
- otel-collector: 0.109.0
- Environment name and version: Docker (27.0.3)
- Operating System and version: MacOS (14.6.1)
- Filters and plugins: No additional