Skip to content

Unexpectedly high execution times with extension layer v91 #987

@jo-me

Description

@jo-me

Hy there,

we're seeing some strange effects with the latest extension versions that we can't really explain.

The function execution takes significantly longer than expected - apparently due to overhead related to tracing/metrics so that the total time of execution hits our 3s function timeout for our typical 0.5-1.5s workload.
Hitting the timeout results in lambda instance shutdown and subsequent cold starts.

There are a couple of ways to turn on/off this "behavior" with v90 and v91, but I've boiled down my function code to a minimal handler that sleeps for 1s before returning a positive result (to the AWS LoadBalancer that triggered it). It uses DD extension layer v91 .

Expected would be a cold start (e.g. 500ms) and then execution time of slightly above 1000ms, but xray shows times closer to 4000ms!

Image

I can "fix" this minimal function by:

  • increasing RAM (e.g. to 192 MB) or
  • increasing the timeout to 5s or
  • disabling DD_ENHANCED_METRICS and DD_TRACE_ENABLED (one OR the other is not sufficient)

All of which is not really desired and should not be necessary.

My lambda function is

  • written in Golang,
  • uses runtime provided.al2023
  • is happy with 128 MB ram (logs say Max Memory Used: 78 MB)
  • running in a VPC (with internet connectivity)
  • tiggered by ALB requests
  • in region eu-central-1

The env variables set are (among others)

  • DD_API_KEY_SECRET_ARN | arn:aws:secretsmanager:….
  • DD_ENHANCED_METRICS | true
  • DD_RUNTIME_METRICS_ENABLED | true
  • DD_SERVERLESS_LOGS_ENABLED | false
  • DD_SITE | datadoghq.com
  • DD_TRACE_ENABLED | true

The minimal handler function that I use to reproduce this is:


import (
	"context"
	"time"

	ddlambda "github.com/DataDog/dd-trace-go/contrib/aws/datadog-lambda-go/v2"
	"github.com/aws/aws-lambda-go/events"
	"github.com/aws/aws-lambda-go/lambda"
)

func main() {
	handler := func(ctx context.Context, req events.ALBTargetGroupRequest) (events.ALBTargetGroupResponse, error) {
		time.Sleep(1000 * time.Millisecond)

		return events.ALBTargetGroupResponse{
			StatusCode:        200,
			StatusDescription: "200 OK",
			Body:              "{\"message\": \"Hello world\"}",
			MultiValueHeaders: map[string][]string{
				"Content-Type": {"application/json"},
			},
			IsBase64Encoded: false,
		}, nil
	}

	lambda.Start(ddlambda.WrapFunction(handler, nil))
}

My actual function is still on extension v90 and I can see a similar pattern if the function code initializes very quickly (e.g. without any network I/O. Adding a 100ms sleep during initialization (not in the handler) "fixed" this there, but this trick does not work with v91 anymore.

Thanks for any hints!

Jochen

Jochen Mehlhorn [email protected], Mercedes-Benz Tech Innovation GmbH
 
Provider Information

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions