Skip to content

Creation of AWS HTTP provider silently fails with ipv6 AWS_CONTAINER_CREDENTIALS_FULL_URI #10699

@danxmoran

Description

@danxmoran

Bug Report

Describe the bug

I'm trying to set up fluent bit on an ipv6 EKS cluster, using EKS pod identity. The pod is failing to auth with AWS.

It seems the HTTP provider is not being added to the credential chain, so the pod never attempts to use the pod identity approach. I would expect to see this debug log if the provider was being added to the chain, but it's not present. Strangely, I'm also not seeing any error logs from the provider creation logic - the entire code path appears to be silently skipped / dropped.

While debugging, I noticed that my cluster is injecting an ipv6 address for AWS_CONTAINER_CREDENTIALS_FULL_URI:

- name: AWS_CONTAINER_CREDENTIALS_FULL_URI
  value: http://[fd00:ec2::23]/v1/credentials

The pod identity tests added in #10114 only cover ipv4 addresses, so I wondered if this might be the cause. If flb_utils_url_split_sds returns a -1 due to a parsing failure during provider creation the function returns NULL without logging any errors, which is consistent with the behavior I'm seeing:

ret = flb_utils_url_split_sds(full_uri, &protocol, &host, &port_sds, &path);
if (ret < 0) {
return NULL;
}

I see there are no ipv6 addresses in the URL test cases in tests/internal/utils.c:

struct url_check url_checks[] = {
{0, "https://fluentbit.io/something",
"https", "fluentbit.io", "443", "/something"},
{0, "http://fluentbit.io/something",
"http", "fluentbit.io", "80", "/something"},
{0, "https://fluentbit.io", "https", "fluentbit.io", "443", "/"},
{0, "https://fluentbit.io:1234/something",
"https", "fluentbit.io", "1234", "/something"},
{0, "https://fluentbit.io:1234", "https", "fluentbit.io", "1234", "/"},
{0, "https://fluentbit.io:1234/", "https", "fluentbit.io", "1234", "/"},
{0, "https://fluentbit.io:1234/v", "https", "fluentbit.io", "1234", "/v"},
{-1, "://", NULL, NULL, NULL, NULL},
};

I took a stab at adding a test case locally:

{0, "http://[fd00:ec2::23]/v1/credentials", "http", "[fd00:ec2::23]", "80", "/v1/credentials"}

After building and running ./bin/flb-it-utils I see a test failure that (I think) supports my hypothesis:

Test url_split...                               [2025/08/06 13:41:05] [error] [/Users/dan/fluent-bit/src/flb_utils.c:1207 errno=22] Invalid argument
[ FAILED ]
  utils.c:145: Check ret == u->ret... failed
Test url_split_sds...                           [2025/08/06 13:41:05] [error] [/Users/dan/fluent-bit/src/flb_utils.c:1303 errno=22] Invalid argument
[ FAILED ]
  utils.c:63: Check ret == u->ret... failed

To Reproduce

Run fluent-bit with:

  1. An AWS output
  2. A non-empty AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE in the env
  3. AWS_CONTAINER_CREDENTIALS_FULL_URI=http://[fd00:ec2::23]/v1/credentials

Here is my full pod spec, for reference:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    checksum/config: a5da2e8e7b5844c21d9b8d62ac0ca0d423a686f18b230132d312ccdd95e10d80
  creationTimestamp: "2025-08-06T20:15:43Z"
  generateName: fluent-bit-
  labels:
    app.kubernetes.io/instance: fluent-bit
    app.kubernetes.io/name: fluent-bit
    controller-revision-hash: 7cf5c795d6
    pod-template-generation: "3"
  name: fluent-bit-jbn5z
  namespace: fluent-bit
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: fluent-bit
    uid: 498432c5-69c3-4e33-b018-eedfabf35453
  resourceVersion: "979969"
  uid: ea5e6c10-7f09-468b-935e-d204d7754e60
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - i-065e4915f521a4316
  containers:
  - args:
    - --workdir=/fluent-bit/etc
    - --config=/fluent-bit/etc/conf/fluent-bit.conf
    command:
    - /fluent-bit/bin/fluent-bit
    env:
    - name: AWS_STS_REGIONAL_ENDPOINTS
      value: regional
    - name: AWS_DEFAULT_REGION
      value: us-east-1
    - name: AWS_REGION
      value: us-east-1
    - name: AWS_CONTAINER_CREDENTIALS_FULL_URI
      value: http://[fd00:ec2::23]/v1/credentials
    - name: AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE
      value: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token
    image: fluent/fluent-bit:4.0.6
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /
        port: http
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: fluent-bit
    ports:
    - containerPort: 2020
      name: http
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /api/v1/health
        port: http
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: 100m
        memory: 100Mi
      requests:
        cpu: 100m
        memory: 100Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /fluent-bit/etc/conf
      name: config
    - mountPath: /var/log
      name: varlog
    - mountPath: /var/lib/docker/containers
      name: varlibdockercontainers
      readOnly: true
    - mountPath: /etc/machine-id
      name: etcmachineid
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-62248
      readOnly: true
    - mountPath: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount
      name: eks-pod-identity-token
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: i-065e4915f521a4316
  preemptionPolicy: PreemptLowerPriority
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: fluent-bit
  serviceAccountName: fluent-bit
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  volumes:
  - name: eks-pod-identity-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: pods.eks.amazonaws.com
          expirationSeconds: 83169
          path: eks-pod-identity-token
  - configMap:
      defaultMode: 420
      name: fluent-bit
    name: config
  - hostPath:
      path: /var/log
      type: ""
    name: varlog
  - hostPath:
      path: /var/lib/docker/containers
      type: ""
    name: varlibdockercontainers
  - hostPath:
      path: /etc/machine-id
      type: File
    name: etcmachineid
  - name: kube-api-access-62248
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace

Expected behavior

I expect fluent-bit to support ipv6 values for AWS_CONTAINER_CREDENTIALS_FULL_URI.

If for some reason it can't, I'd expect it to log an error in the case when parsing the URI fails.

Your Environment

  • Version used: 4.0.6 (also tested 3.2.10)
  • Environment name and version: EKS, Auto Mode, Kubernetes version 1.32, ipv6 networking enabled
  • Configuration:
    [SERVICE]
        Daemon Off
        Flush 1
        Log_Level debug
        Parsers_File /fluent-bit/etc/parsers.conf
        Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
        HTTP_Server On
        HTTP_Listen 0.0.0.0
        HTTP_Port 2020
        Health_Check On
    
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        DB                /var/log/flb_kube.db
        Parser            cri
        Docker_Mode       Off
        multiline.parser  docker, cri
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10
        multiline.parser  docker, cri
        Path_Key          filename
        Read_from_Head    true
    [INPUT]
        Name systemd
        Tag  journald.*
        Path /var/log/journal
        DB   /var/log/flb_journald.db
    
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Merge_Log           On
        Merge_Log_Key       data
        Keep_Log            Off
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On
        Buffer_Size         512k
    [FILTER]
        Name modify
        Match *
        Add hostname ${HOSTNAME}
        Add cluster  <redacted>
    
    [OUTPUT]
        Name kinesis_firehose
        Match *
        region us-east-1
        delivery_stream <redacted>
        endpoint <redacted>
    

Additional context
We are exploring a migration from IRSA to Pod Identity for our EKS workloads. This is blocking the migration.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions