-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Bug Report
Describe the bug
I'm trying to set up fluent bit on an ipv6 EKS cluster, using EKS pod identity. The pod is failing to auth with AWS.
It seems the HTTP provider is not being added to the credential chain, so the pod never attempts to use the pod identity approach. I would expect to see this debug log if the provider was being added to the chain, but it's not present. Strangely, I'm also not seeing any error logs from the provider creation logic - the entire code path appears to be silently skipped / dropped.
While debugging, I noticed that my cluster is injecting an ipv6 address for AWS_CONTAINER_CREDENTIALS_FULL_URI
:
- name: AWS_CONTAINER_CREDENTIALS_FULL_URI
value: http://[fd00:ec2::23]/v1/credentials
The pod identity tests added in #10114 only cover ipv4 addresses, so I wondered if this might be the cause. If flb_utils_url_split_sds
returns a -1
due to a parsing failure during provider creation the function returns NULL
without logging any errors, which is consistent with the behavior I'm seeing:
fluent-bit/src/aws/flb_aws_credentials_http.c
Lines 353 to 356 in 1d4c4dc
ret = flb_utils_url_split_sds(full_uri, &protocol, &host, &port_sds, &path); | |
if (ret < 0) { | |
return NULL; | |
} |
I see there are no ipv6 addresses in the URL test cases in tests/internal/utils.c
:
fluent-bit/tests/internal/utils.c
Lines 27 to 39 in 1d4c4dc
struct url_check url_checks[] = { | |
{0, "https://fluentbit.io/something", | |
"https", "fluentbit.io", "443", "/something"}, | |
{0, "http://fluentbit.io/something", | |
"http", "fluentbit.io", "80", "/something"}, | |
{0, "https://fluentbit.io", "https", "fluentbit.io", "443", "/"}, | |
{0, "https://fluentbit.io:1234/something", | |
"https", "fluentbit.io", "1234", "/something"}, | |
{0, "https://fluentbit.io:1234", "https", "fluentbit.io", "1234", "/"}, | |
{0, "https://fluentbit.io:1234/", "https", "fluentbit.io", "1234", "/"}, | |
{0, "https://fluentbit.io:1234/v", "https", "fluentbit.io", "1234", "/v"}, | |
{-1, "://", NULL, NULL, NULL, NULL}, | |
}; |
I took a stab at adding a test case locally:
{0, "http://[fd00:ec2::23]/v1/credentials", "http", "[fd00:ec2::23]", "80", "/v1/credentials"}
After building and running ./bin/flb-it-utils
I see a test failure that (I think) supports my hypothesis:
Test url_split... [2025/08/06 13:41:05] [error] [/Users/dan/fluent-bit/src/flb_utils.c:1207 errno=22] Invalid argument
[ FAILED ]
utils.c:145: Check ret == u->ret... failed
Test url_split_sds... [2025/08/06 13:41:05] [error] [/Users/dan/fluent-bit/src/flb_utils.c:1303 errno=22] Invalid argument
[ FAILED ]
utils.c:63: Check ret == u->ret... failed
To Reproduce
Run fluent-bit with:
- An AWS output
- A non-empty
AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE
in the env AWS_CONTAINER_CREDENTIALS_FULL_URI=http://[fd00:ec2::23]/v1/credentials
Here is my full pod spec, for reference:
apiVersion: v1
kind: Pod
metadata:
annotations:
checksum/config: a5da2e8e7b5844c21d9b8d62ac0ca0d423a686f18b230132d312ccdd95e10d80
creationTimestamp: "2025-08-06T20:15:43Z"
generateName: fluent-bit-
labels:
app.kubernetes.io/instance: fluent-bit
app.kubernetes.io/name: fluent-bit
controller-revision-hash: 7cf5c795d6
pod-template-generation: "3"
name: fluent-bit-jbn5z
namespace: fluent-bit
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: DaemonSet
name: fluent-bit
uid: 498432c5-69c3-4e33-b018-eedfabf35453
resourceVersion: "979969"
uid: ea5e6c10-7f09-468b-935e-d204d7754e60
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- i-065e4915f521a4316
containers:
- args:
- --workdir=/fluent-bit/etc
- --config=/fluent-bit/etc/conf/fluent-bit.conf
command:
- /fluent-bit/bin/fluent-bit
env:
- name: AWS_STS_REGIONAL_ENDPOINTS
value: regional
- name: AWS_DEFAULT_REGION
value: us-east-1
- name: AWS_REGION
value: us-east-1
- name: AWS_CONTAINER_CREDENTIALS_FULL_URI
value: http://[fd00:ec2::23]/v1/credentials
- name: AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE
value: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token
image: fluent/fluent-bit:4.0.6
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: fluent-bit
ports:
- containerPort: 2020
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /api/v1/health
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /fluent-bit/etc/conf
name: config
- mountPath: /var/log
name: varlog
- mountPath: /var/lib/docker/containers
name: varlibdockercontainers
readOnly: true
- mountPath: /etc/machine-id
name: etcmachineid
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-62248
readOnly: true
- mountPath: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount
name: eks-pod-identity-token
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: i-065e4915f521a4316
preemptionPolicy: PreemptLowerPriority
priority: 2000001000
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
seccompProfile:
type: RuntimeDefault
serviceAccount: fluent-bit
serviceAccountName: fluent-bit
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/disk-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/pid-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
operator: Exists
volumes:
- name: eks-pod-identity-token
projected:
defaultMode: 420
sources:
- serviceAccountToken:
audience: pods.eks.amazonaws.com
expirationSeconds: 83169
path: eks-pod-identity-token
- configMap:
defaultMode: 420
name: fluent-bit
name: config
- hostPath:
path: /var/log
type: ""
name: varlog
- hostPath:
path: /var/lib/docker/containers
type: ""
name: varlibdockercontainers
- hostPath:
path: /etc/machine-id
type: File
name: etcmachineid
- name: kube-api-access-62248
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
Expected behavior
I expect fluent-bit to support ipv6 values for AWS_CONTAINER_CREDENTIALS_FULL_URI
.
If for some reason it can't, I'd expect it to log an error in the case when parsing the URI fails.
Your Environment
- Version used: 4.0.6 (also tested 3.2.10)
- Environment name and version: EKS, Auto Mode, Kubernetes version 1.32, ipv6 networking enabled
- Configuration:
[SERVICE] Daemon Off Flush 1 Log_Level debug Parsers_File /fluent-bit/etc/parsers.conf Parsers_File /fluent-bit/etc/conf/custom_parsers.conf HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020 Health_Check On [INPUT] Name tail Tag kube.* Path /var/log/containers/*.log DB /var/log/flb_kube.db Parser cri Docker_Mode Off multiline.parser docker, cri Mem_Buf_Limit 5MB Skip_Long_Lines On Refresh_Interval 10 multiline.parser docker, cri Path_Key filename Read_from_Head true [INPUT] Name systemd Tag journald.* Path /var/log/journal DB /var/log/flb_journald.db [FILTER] Name kubernetes Match kube.* Kube_URL https://kubernetes.default.svc.cluster.local:443 Merge_Log On Merge_Log_Key data Keep_Log Off K8S-Logging.Parser On K8S-Logging.Exclude On Buffer_Size 512k [FILTER] Name modify Match * Add hostname ${HOSTNAME} Add cluster <redacted> [OUTPUT] Name kinesis_firehose Match * region us-east-1 delivery_stream <redacted> endpoint <redacted>
Additional context
We are exploring a migration from IRSA to Pod Identity for our EKS workloads. This is blocking the migration.