-
Notifications
You must be signed in to change notification settings - Fork 2
[azints-2955][forwarder] dead letter queue #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| logs := make([]datadogV2.HTTPLogItem, 0, len(c.logsBuffer)) | ||
| for _, currLog := range c.logsBuffer { | ||
| logs = append(logs, newHTTPLogItem(currLog)) | ||
| _, _, err = c.logsSubmitter.SubmitLog(ctx, c.logsBuffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is SubmitLog used elsewhere? Why are 2 of the return values being discarded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is the only place where it is called. The first object is the unmarshalled return object of the api and the second is the http response. Neither end up being needed by our code
Co-authored-by: Rebecca Parsons <[email protected]>
…om:DataDog/azure-log-forwarding-orchestration into matt.spurlin/AZINTS-2955/dead-letter-queue
Co-authored-by: Rebecca Parsons <[email protected]>
Coverage ReportControl Plane Coverage:
Forwarder Coverage:
|
| } | ||
|
|
||
| // FromBytes creates a DeadLetterQueue object from the given bytes. | ||
| func FromBytes(logsClient *logs.Client, data []byte) (*DeadLetterQueue, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it wasn't for the logsClient argument, DeadLetterQueue is actually really close to implementing the JSON Marshaler and Unmarshaler interfaces.
And so is Cursor.
Probably not worth doing anything about, but just something I'm noticing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good callout. Might be something worth following up on as a tech debt item
ava-silver
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple small comments but overall LGTM
| } | ||
|
|
||
| // Content converts the log content to a string. | ||
| func (l *Log) Content() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be better as String() so Log implements the Stringer interface? or is that more for like debugging purposes (still new to go idioms)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately doesn't offer much if any improvement here. Stringer isn't automatically applied in the case where this is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stringer is most useful if you are passing something to anything in fmt
| return string(*l.content) | ||
| } | ||
| // ValidateDatadogLog checks if the log is valid to send to Datadog and returns the log size when it is. | ||
| func ValidateDatadogLog(log datadogV2.HTTPLogItem, logger *log.Entry) (int64, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to consolidate the usage of this function and Log.Validate()? They seem to be doing similar things, I wonder if we could just have one path for log validation rather than two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat. I have created a shared validation function and these now take a Log or HHTTPLogItem and pass the appropriate fields to that shared function
| // Load loads the DeadLetterQueue from the storage client. | ||
| func Load(ctx context.Context, storageClient *storage.Client, logsClient *logs.Client) (*DeadLetterQueue, error) { | ||
| span, ctx := tracer.StartSpanFromContext(ctx, "deadletterqueue.Load") | ||
| defer span.Finish() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't picking up the error, needs to generally be of the form:
defer func() { span.Finish(tracer.WithError(err)) }()
| defer span.Finish() | ||
|
|
||
| // prune invalid logs | ||
| queue := make([]datadogV2.HTTPLogItem, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we expect the majority of logs to be invalid, should preallocate this to len(d.client.FailedLogs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any guidance on when to prefer preallocating the excess vs erring on the size of a 0 allocation?
|
|
||
| // ValidateDatadogLog checks if the log is valid to send to Datadog and returns the log size when it is. | ||
| func ValidateDatadogLog(log datadogV2.HTTPLogItem, logger *log.Entry) (int64, bool) { | ||
| logBytes, err := log.MarshalJSON() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a serialization pass just to throw the result away will be very inefficient, could we either cache this result somewhere if we can use it later on, or deal with serialization errors later on when we try to serialize them to submit them to the API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't serializing to catch serialization errors. The goal is to get the final byte size of the log as the Datadog API has per log and per payload limits on byte sizes. The length of the result does get used in multiple places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still a ton of overhead, assuming most of the log size is from the Content, which is already a string and can be measured cheaply, I'd think you'd be able to bypass this by adding some buffer to the max batch size. You might also want to add error handling in the submit flow to split a batch up in case you're still over the limit.
Also, what's the story with compression here? The batch size limit is after compression as I understand it Batch size limit is before compression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how the Go API works, if you're able to send Protobufs to the API then you can also Marshal them yourself and get the size that way without wasting effort. That's what we do from the resource crawlers https://github.com/DataDog/dd-source/blob/main/domains/cloud_platform/libs/resource_sink/eventplatform.go#L246-L261
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need to add gzip compression.
Individual log size still applies. The limit does specifically mentioned that the limit applies to the uncompressed data. https://github.com/DataDog/datadog-api-client-go/blob/master/api/datadogV2/api_logs.go#L529
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The public logs submission API only has support for JSON sadly https://docs.datadoghq.com/api/latest/logs/#send-logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough about logs intake to know how track types maps to the public API. Does this apply to anything sent to https://http-intake.logs.datadoghq.com/api/v2/[TRACK-NAME]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should ask, there are existing entires in there for GCP and AWS, guessing they're doing something similar but I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relatedly I think there must be some way to send them zstd compression, that'd be way better than gzip DataDog/datadog-lambda-extension#558
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been able to confirm zstd and protobuf are on the table here. Added tasks to our backlog to correct this in the future.
Add protobuf support https://datadoghq.atlassian.net/browse/AZINTS-3210
Add zstd compression https://datadoghq.atlassian.net/browse/AZINTS-3209
Description
Adds a dead letter queue. Also went through and did a few renames where the symbol had the package name in its name.
Testing
Unit tests were added. Manually tested with a faulty HTTP client to ensure logs were sent to a DLQ and the DLQ could serialize/deserialize from Azure.