Skip to content

Conversation

@astuyve
Copy link
Contributor

@astuyve astuyve commented Feb 21, 2025

This removes about ~30ms of latency for customers with very busy functions by no longer blocking on the platform.runtimeDone event, which is provided to us by Lambda via the Telemetry API. The call has a minimum 25ms buffer time, which is why the delay is present.
image

  • Splits the loop into 3 distinct branches:
    • Flushing at the end of a invocation, used for first few invocations
    • Flushing at the beginning of an invocation, used for flush cycle for busy periodic functions
    • Skip flushing, used for non-flush cycle
  • we still flush with the race timeout in order to handle long running functions where customers want telemetry data while the function is running. This fixes an issue where we would break until the end for long running functions after flushing once.
  • includes a fix to read all of the logs flush response

@astuyve astuyve requested a review from a team as a code owner February 21, 2025 15:12
Ok(resp) => {
if resp.status() != 202 {
let status = resp.status();
_ = resp.text().await;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes an issue where keepalive can't be used if we don't fully read/exhaust the response buffer

should_periodic_flush
);
return should_periodic_flush;
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously this method would control whether or not to switch over to flushing periodically as well as determining if we've switched to periodic and it's time to flush. That's partially why the main loop was so confusing.

Instead, we've lifted that logic to the main loop, so this method can now be simplified. It only determines if the user specified they want to flush at end, or if we're on the default strategy – if we've seen enough invocations to flip over to periodic flushing. If the answer is yes, then we should not flush at end.

);

if let Some(metrics) = metrics {
return Some(RuntimeDoneMeta {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just sent the event back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like having the runtime done meta because right now that's the only thing we care about, and the data we need is buried in an option. If we just return the event we'd push this matching/conditional handling into the main loop in an already kinda complex case.

thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like it a 100% but we can change it in the future. My idea was still sending the clone of the event as an optional, that way we ensure we don't have to push the conditional back to the loop.

It just feels unnecessary when an object already exist for this which has the same information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's a little weirder with the enum because we strip a bunch of the event info away and just return what we need. It's another alloc but it should be stack alloc'd and fast anyway. We can change it later

@astuyve astuyve merged commit 3260e2b into main Feb 24, 2025
33 checks passed
@astuyve astuyve deleted the aj/new-loop branch February 24, 2025 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants