Skip to content

logs exporting timed out failed to make an HTTP request #1121

@bneijt

Description

@bneijt

name: logs exporting fails frequently
about: Trying to get log records in cloudwatch that follow the otel structured logging format to allow for monitoring.
title: Logs exporting fails every now and then.
labels: bug

Describe the bug
I'm using the OTEL endpoints provided by an AWS account with transaction service enabled.

This requires a non-default configuration file with the following config:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "localhost:4317"

exporters:
  otlphttp/traces:
    compression: gzip
    traces_endpoint: https://xray.eu-west-1.amazonaws.com/v1/traces
    auth:
      authenticator: sigv4auth/traces
  otlphttp/logs:
    compression: gzip
    logs_endpoint: https://logs.eu-west-1.amazonaws.com/v1/logs
    auth:
      authenticator: sigv4auth/logs
    headers:
      x-aws-log-group: ${env:OTEL_LOG_GROUP_NAME}
      x-aws-log-stream: ${env:OTEL_LOG_STREAM_NAME}

extensions:
  sigv4auth/logs:
    region: "eu-west-1"
    service: "logs"
  sigv4auth/traces:
    region: "eu-west-1"
    service: "xray"

service:
  extensions: [sigv4auth/logs, sigv4auth/traces]
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlphttp/traces]
    logs:
      receivers: [otlp]
      exporters: [otlphttp/logs]

(I have left out my awsemf metrics configuration for brevity)

I use the arn:aws:lambda:eu-west-1:901920570463:layer:aws-otel-collector-amd64-ver-0-117-0:1 collector layer.

Most logs end up in the cloudwatch stream I created and they are delivered without issue. This allows me to run a cloudwatch log metric on top of the stream using the severityNumber to filter anything above warning level and trigger an alarm if we have an error.

However, some logs are dropped:


2025-09-02T18:10:53.223Z
{
    "level": "error",
    "ts": 1756836653.2207713,
    "caller": "internal/base_exporter.go:128",
    "msg": "Exporting failed. Rejecting data. Try enabling sending_queue to survive temporary failures.",
    "kind": "exporter",
    "data_type": "logs",
    "name": "otlphttp/logs",
    "error": "request is cancelled or timed out failed to make an HTTP request: Post \"https://logs.eu-west-1.amazonaws.com/v1/logs\": EOF",
    "rejected_items": 4,
    "stacktrace": "go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*BaseExporter).Send\n\tgo.opentelemetry.io/collector/[email protected]/exporterhelper/internal/base_exporter.go:128\ngo.opentelemetry.io/collector/exporter/exporterhelper.NewLogsRequest.func1\n\tgo.opentelemetry.io/collector/[email protected]/exporterhelper/logs.go:136\ngo.opentelemetry.io/collector/consumer.ConsumeLogsFunc.ConsumeLogs\n\tgo.opentelemetry.io/collector/[email protected]/logs.go:26\ngo.opentelemetry.io/collector/consumer.ConsumeLogsFunc.ConsumeLogs\n\tgo.opentelemetry.io/collector/[email protected]/logs.go:26\ngo.opentelemetry.io/collector/receiver/otlpreceiver/internal/logs.(*Receiver).Export\n\tgo.opentelemetry.io/collector/receiver/[email protected]/internal/logs/otlp.go:41\ngo.opentelemetry.io/collector/pdata/plog/plogotlp.rawLogsServer.Export\n\tgo.opentelemetry.io/collector/[email protected]/plog/plogotlp/grpc.go:88\ngo.opentelemetry.io/collector/pdata/internal/data/protogen/collector/logs/v1._LogsService_Export_Handler.func1\n\tgo.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/logs/v1/logs_service.pb.go:311\ngo.opentelemetry.io/collector/config/configgrpc.(*ServerConfig).getGrpcServerOptions.enhanceWithClientInformation.func9\n\tgo.opentelemetry.io/collector/config/[email protected]/configgrpc.go:517\ngo.opentelemetry.io/collector/pdata/internal/data/protogen/collector/logs/v1._LogsService_Export_Handler\n\tgo.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/logs/v1/logs_service.pb.go:313\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\tgoogle.golang.org/[email protected]/server.go:1405\ngoogle.golang.org/grpc.(*Server).handleStream\n\tgoogle.golang.org/[email protected]/server.go:1815\ngoogle.golang.org/grpc.(*Server).serveStreams.func2.1\n\tgoogle.golang.org/[email protected]/server.go:1035"
}

The error suggests I should consider the sending_queue feature, but this is part of the exporterhelper and not included in the collector. Other solutions prompted by the internet, like the decouple or batch processor are also not part of the collector.

One mention on the internet said the timeout might be because there was not enough memory, but the lambda currently has 1024MB of memory and has only really simple Python "fetch a record from dynamodb and return" code.

Steps to reproduce
Create a lambda with Python runtime
Let it log a line of info when using
Call the lambda multiple times
Wait for the issue to arize

The collector side will say

request is cancelled or timed out failed to make an HTTP request: Post \"https://logs.eu-west-1.amazonaws.com/v1/logs\": EOF

and the Python code will say

    "level": "ERROR",
    "message": "Failed to export logs to localhost:4317, error code: StatusCode.DEADLINE_EXCEEDED",
    "logger": "opentelemetry.exporter.otlp.proto.grpc.exporter",
    "requestId": "ad9c1922-719c-441e-b0e7-14315824d366",
    "otelSpanID": "0",
    "otelTraceID": "0",

and two log messages will be missing from cloudwatch but available in the stdout output from the lambda.

What did you expect to see?
I would expect it to be possible to use logging in a way that gives me actual log records with severityNumber in cloudwatch so I can effectively monitor for anythin above warning or that does not have a severityNumber present in the log stream.

What did you see instead?
Failure to use the OTEL log endpoints in AWS

What version of collector/language SDK version did you use?
arn:aws:lambda:eu-west-1:901920570463:layer:aws-otel-collector-amd64-ver-0-117-0:1 collector layer
and the following Python libraries

opentelemetry-api==1.36.0 \
opentelemetry-distro==0.57b0 \
opentelemetry-exporter-otlp==1.36.0 \
opentelemetry-exporter-otlp-proto-common==1.36.0 \
opentelemetry-exporter-otlp-proto-grpc==1.36.0 \
opentelemetry-exporter-otlp-proto-http==1.36.0 \
opentelemetry-instrumentation==0.57b0 \
opentelemetry-instrumentation-asgi==0.57b0 \
opentelemetry-instrumentation-aws-lambda==0.57b0 \
opentelemetry-instrumentation-botocore==0.57b0 \
opentelemetry-instrumentation-fastapi==0.57b0 \
opentelemetry-instrumentation-logging==0.57b0 \
opentelemetry-propagator-aws-xray==1.0.2 \
opentelemetry-propagator-b3==1.36.0 \
opentelemetry-proto==1.36.0 \
opentelemetry-sdk==1.36.0 \
opentelemetry-sdk-extension-aws==2.1.0 \
opentelemetry-semantic-conventions==0.57b0 \
opentelemetry-util-http==0.57b0 \

What language layer did you use?
Python

Additional context
I'm currently bound to the aws-otel-lambda because I can't find a way to get cloudwatch metrics to work based on opentelemetry without using the awsemf exporter and that is not in the community collector layer for lambda.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions