Skip to content

Conversation

@Kavindu-Dodan
Copy link
Contributor

@Kavindu-Dodan Kavindu-Dodan commented Nov 26, 2025

Description

This PR adds CloudWatch logs Lambda subscription handling support to awslambdareceiver.

image

With this change, the receiver can now be deployed as a Lambda and support handling either of the signals: S3 triggers or CloudWatch subscription logs.

Note - Receiver can only handle one signal type per deployment.

Link to tracking issue

Completes part of #43504

Testing

I have added unit tests to cover most of the new functionality. One of them (TestHandleCloudwatchLogEvent) provides end to end validation of a CloudWatch event to plogs conversion.

Documentation

Existing documentation already covered some aspects. However, I have attempted to uplift them along with code level comments.

@Kavindu-Dodan Kavindu-Dodan force-pushed the feat/awslambdareceiver-cloudwatch-signal branch from 2ae6e5d to e9aa82a Compare November 26, 2025 23:08
Copy link
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks fine, but I'd like to revisit the config. Implicitly making it CloudWatch-based because the S3 encoding isn't configured is a bit unintuitive.

| `s3_encoding` | Name of the encoding extension to use for S3 objects | "awslogs_encoding" | Optional |
| Name | Description | Required |
|:--------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
| `s3_encoding` | Name of the encoding extension to use for S3 objects. For example, `awslogs_encoding` to use AWS logs encoding extensions. When unspecified, falls back to CloudWatch subscription filter handling | Optional |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing to me. CloudWatch subscription filter handling isn't relevant when decoding S3 objects, right? So maybe reword to:

Name of the encoding extension to use for S3 objects. This is only used for handling S3 events, and not for CloudWatch Log events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @axw yes this is something I thought about but I upstreamed the existing mechanism as is. However, we now have a chance to improve this usage.

As next step, I will come up with a proposal to improve this

Comment on lines -18 to +21
// If S3Encoding is unspecified, the receiver will return an error for any S3 event notifications.
// If S3 data is in multiple formats (ex:- VPC flow logs, CloudTrail logs), you should deploy
// separate Lambda functions with specific extension configurations.
//
// If you have objects with multiple different encodings to handle, you should deploy
// separate Lambda functions with different configurations.
// If unspecified, the receiver falls back to work with CloudWatch Log subscription encoding extension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we went back and forth on this internally, but I'd really like to keep these things independent. Can we maintain the existing doc comment please? It's confusing to talk about CloudWatch logs in the same breath as S3 encoding - they're totally separate events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am exploring the option described here - #44562 (comment) . Let me know your view on this.

Comment on lines +169 to +176
// If S3Encoding is not set, then we consider this to be CloudWatch subscription mode
if cfg.S3Encoding == "" {
return nil, errors.New("s3_encoding is required for logs stored in S3")
unmarshaler, err := loadSubFilterLogUnmarshaler(ctx, awslogsencodingextension.NewFactory())
if err != nil {
return nil, err
}

return newCWLogsSubscriptionHandler(set.Logger, unmarshaler.UnmarshalLogs, next.ConsumeLogs), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have two separate handlers in the receiver - one for CloudWatch, and an optional one for S3? If no s3_encoding is specified, then the latter handler will be nil. Then if an S3 event comes in, an error will be returned.

@Kavindu-Dodan
Copy link
Contributor Author

Kavindu-Dodan commented Nov 27, 2025

Code looks fine, but I'd like to revisit the config. Implicitly making it CloudWatch-based because the S3 encoding isn't configured is a bit unintuitive.

Thanks @axw for this remark. Personally, I also saw this to be misleading (detecting CW logs operation mode when the S3 encoder is missing felt wrong). I am exploring a new strategy for this but it will be a considerable refactoring. And I think this is the right time to fix this.

What I am considering is the following,

  • Instead of using s3_encoding, we will migrate to a generic configuration named encoding_extension
  • This allows the user to define any valid encoding extension to decode/process the receiving data
  • Along with this, we will construct the lambdaEventHandler dynamically based on the signal that arrives (initialized lazily and then kept for runtime)
    • This is essential as currently we use missing s3_encoding configuration to fallback to cwLogsSubscriptionHandler
    • Going forward, we will not build the lambdaEventHandler until we receive the first event
    • And based on the event, we will generate the event handler. For example, if it's s3, we construct and use s3Handler. And if the event is from CW logs, then we build the cwLogsSubscriptionHandler. Once initialized handlers will be reused
  • As a side effect of this improvement, we get the following,
    • Lambda receiver now can use any valid encoding extension
    • Lambda receiver can handle multiple sources (ex:- both S3 & CW logs) at once (note - this applied to logs. for metrics it's only S3 mode as CW logs backed metrics does not make sense)

I am actively working on this improvement and will update this as a single commit to have focus. But let me know if you think of a different approach.

@axw
Copy link
Contributor

axw commented Nov 28, 2025

Instead of using s3_encoding, we will migrate to a generic configuration named encoding_extension

How would that encoding extension be used for CloudWatch Logs? They have a specific structure -- they must be decoded in the CloudWatch log subscription filter format.

I think it would make sense to use it for decoding the message field, so e.g. you could use awslogs_encoding to decode CloudTrail logs sent to CloudWatch.

You could support both S3 and CloudWatch events in the one Lambda, but they would have to have the same encoding, which still adds a little bit of cognitive overhead.

Another option would be to have two separate attribute groups, along these lines:

receivers/
  awslambda:
    s3:
      encoding: ... # required for handling S3 events
    cloudwatch_logs:
      message_encoding: ... # optional; defaults to using the message field as the log record body

I personally think that's easier to reason about, since it keeps the two event types completely separate. Configuring the encoding for one event type has no impact on the other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants