Skip to content

Transactional cache API, missing some options#471

Merged
cceckman-at-fastly merged 26 commits intomainfrom
cceckman/cache-transact
May 6, 2025
Merged

Transactional cache API, missing some options#471
cceckman-at-fastly merged 26 commits intomainfrom
cceckman/cache-transact

Conversation

@cceckman-at-fastly
Copy link
Copy Markdown
Contributor

Start implementing the transactional side of the core cache API.

Transactions begin with a transaction_lookup or transaction_lookup_async. When the CacheBusyHandle is awaited, it either returns a result immediately; returns an obligation indicating "you should fulfill this"; or blocks, until the obligatee abandons or fulfills their obligation.

NB, the implementation of cancel here matches a recent change to Fastly's Compute Platform. Previously, close(BusyHandle) would block until the obligation was fulfilled or granted to the closer; now, close doesn't block. This shouldn't really matter to anyone other than making e.g. timeouts to origin fail in better ways.

@cceckman-at-fastly cceckman-at-fastly force-pushed the cceckman/cache-transact branch 3 times, most recently from a0ce2c7 to a0186ee Compare April 23, 2025 15:41
@cceckman-at-fastly cceckman-at-fastly changed the title Initial transactional cache API Transactional cache API, missing some options Apr 23, 2025
Base automatically changed from cceckman/cache-headers to main April 24, 2025 15:39
As I look into it more, I don't think generation IDs or the inner lock
are needed for the transactional API. The outer lock should suffice.
`watch::Sender` sets us up to have an asynchronous transactional API.
The non-transactional `lookup` and `insert` can use the `Sender` side,
acting just like a read/write mutex.

But now, `insert` will automatically
ping any waiters when the modification is done...and those waiters can
wait `async`hronously.
The request headers are needed for insert...but for a transactional
insert, they carry along from the original request's headers.

Make `request_headers` an explicit additional argument for `insert`
rather than part of `WriteOptions`, so we can use `WriteOptions` for
other sorts of inserts.
An entry can be:
- Present but not obligated (e.g. fresh)
- Obligated but not present (request-collapsing lookup)
- Neither obligatated nor present (request-collapsing lookup that failed
  to complete)
- Obligated and present (stale-while-revalidate, request-collapsing
  revalidation)

Make these two separate fields of CacheValue. Also, CacheValue doesn't
need to be Arc<>'d; remove a layer of indirection.
Note that this implements a recently-changed behavior in the compute
platform. Previously, a `PendingTransaction::drop` would *block* on whichever
request got the `GoGet` obligation; more recently (and in Viceroy),
the `PendingTransaction::drop` does not block on any other transaction.
Before returning from `transaction_lookup_async`, we attempt to retrieve
data + an obligation from the `CacheKeyObject`, but without blocking for
any outstanding obligations. Only if this fails do we generate a
background task to wait for other obligations to complete.

The net result is that `BusyHandle::pending` will in some cases
immediately be `true` upon return of the `BusyHandle`, e.g. when the
cache needs filling.
Copy link
Copy Markdown
Contributor

@aturon aturon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another batch of comments / questions.


// Done dealing with other obligations; now we just deal with results that we have.
// Pick a fresh result if we now have it:
// fresh or stale.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment line is a leftover?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not leftover so much as unclear, given that we don't actually deal with stale entries at this point.

This also made me note a weirdness in the logic, that we short-circuit on obligations without returning data. I 2bf2139 I reordered these blocks to:

  1. Scan for fresh data
  2. Scan for stale data that can be revalidated (for now, "just" existing obligations)
  3. Generate a new obligation

which I think will lend itself better to SWR.

response_object.transactional = TransactionState::Present(object);
response_object.generation += 1;
if !cache_key_objects.vary_rules.contains(meta.vary_rule()) {
// Insert at the front, run through the rules in order, so we tend towards fresher
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also move the rule to the front if it already exists?

Copy link
Copy Markdown
Contributor Author

@cceckman-at-fastly cceckman-at-fastly May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely, oversight on my part. Done in fc09643.

I also realize this could be a Vec instead of a VecDeque, since we're only ever working at one end. But I find it easier to read as push_front and iter rather than push and iter().rev() - I'd definitely mess up "use the reverse iterator" at some point.

return false;
}

// Finally, generate an obligation based on the most recent vary rule
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we confirmed this is the behavior in production? I'm thinking of a scenario like:

  • Two requests come in for the same URL, but with different headers
  • Cache contains:
    • Vary rules: [["header1"], ["header2"]]
    • Stale responses keyed as: "header1: foo" and "header2: bar"
  • First request has header1: foo and header2: bar; we mark the header1: foo object as obligated
  • Second request has no header1, but has header2: bar
    • Thus, we generate a second obligation, even though the one for the second request may well apply.

We could reasonably expect these two requests to be collapses, instead.

Copy link
Copy Markdown
Contributor Author

@cceckman-at-fastly cceckman-at-fastly May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thought! I'll add a test for this case, double-check some of the cache code, and circle back.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phrasing this slightly differently as I work through it:

If our heuristic for "which variant will apply to the next request" is "what is the last Vary rule received", we would not expect for these requests to be collapsed - because they have distinct header1 values. We'd be erring on the side of "less latency, more requests" by not collapsing them (i.e. not blocking the second request on the fulfillment of the first).

(Per your other comment, my implementation isn't actually using the "most recent received" heuristic, but that's somewhat separate.)

Copy link
Copy Markdown
Contributor Author

@cceckman-at-fastly cceckman-at-fastly May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote a test in 04d9b0e that matches your example (header1 and header2 are swapped, but I'm pretty sure it's the same behaviorally.) I used the cceckman/outside-test branch to run it on compute platform.

This implementation matches the compute platform behavior: the requests are not collapsed.

Presumably the cache code authors of ECP had the "most recent Vary is right" heuristic in mind as well? I can see it getting pretty bad if there's an old Vary: Accept or something that trumps a Vary: Authorization or something, and causes ~all requests to serialize.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking this out -- that makes good sense to me! 🙏

@cceckman-at-fastly cceckman-at-fastly requested a review from aturon May 5, 2025 16:10
Copy link
Copy Markdown
Contributor

@aturon aturon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work! Very happy to see this land ✨

@cceckman-at-fastly cceckman-at-fastly merged commit ae7eb16 into main May 6, 2025
15 checks passed
@cceckman-at-fastly cceckman-at-fastly deleted the cceckman/cache-transact branch May 6, 2025 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants