Add Git index commit batching #12593

LawnGnome · 2025-12-24T23:40:18Z

Note to reviewers: I'm going to rebase this down to a single commit before merging, but there's a specific change I want feedback on, so I've left my messy local commit history on the branch for now just in case it's helpful.

Also, sorry for the braindump below, but I've paged this in and out of my memory at least three times in the last two months, and I need a shortcut to page it back in that's not me muttering "what the fuck" under my breath over the course of an hour or two while I hammer rust-analyzer and my laptop fan runs at 100% each time I actually have a couple of spare hours to pick this back up.

This PR implements batching for Git index updates in an attempt to prevent the causes of the incidents that resulted in #12281. This is implemented similarly to Cloudfront CDN invalidations: namely, by using a new git_index_sync_queue table to gather the crates that require updates, and then by having the job action those changes in batches (currently, of 100 crates), which means we only need to fetch and push the repo once per batch. The job itself now doesn't have any parameters, and is represented as a simple unit struct in Rust, since it will simply grab the next batch of crates from the new table when it runs.

So far, so good.

There's one aspect of this I don't like, which is the API around this. There are several places in the codebase where we need to trigger Git index updates, which now means that we need to insert a new record into git_index_sync_queue, then enqueue the SyncToGitIndex job. This feels potentially error-prone, since both of these things have to happen for the sync to work correctly.

Initially, I implemented a function that looked like this:

impl SyncToGitIndex {
    /// Enqueues a crate to be synced, and ensures that there is also an instance of the job
    /// enqueued in the worker.
    ///
    /// This is a convenience method that returns the usual [`EnqueueError`] returned by
    /// [`BackgroundJob::enqueue`].
    #[instrument(skip_all)]
    pub async fn enqueue_crate(
        conn: &mut AsyncPgConnection,
        crate_name: &str,
    ) -> Result<Option<i64>, EnqueueError> {
        GitIndexSyncQueueItem::queue(conn, crate_name)
            .await
            .map_err(|e| EnqueueError::DatabaseError(e))?;

        Self.enqueue(conn).await
    }
}

I naïvely assumed that we could then use this as a drop-in replacement for the <SyncToGitIndex as BackgroundJob>::enqueue() calls that we have in various places within tokio::try_join!() macros to take advantage of AsyncPgConnection pipelining.

I was, obviously, wrong, and have come to the conclusion that I fundamentally don't understand something about Diesel and/or Tokio and/or mutable reference semantics in Rust.

Things I've tried:

What I said above, which fails because conn can't be reused multiple times. (Which is fine, but why does it work with the code we have right now?)
Creating a shim struct that impls BackgroundJob just so it can override enqueue and therefore look like a normal job, which fails because the expected BoxFuture is tied to the lifetime of the struct and not the connection, which is obviously what we need to be able to insert a record into git_index_sync_queue.
Creating a function that abstracts both the sync-to-Git and sync-to-sparse job creation, which fails in the same way as option 1 (&mut AsyncPgConnection not being able to be used in multiple functions within the try_join!()).

So this is what we get for now. Maybe it's fine, and my docblock note on SyncToGitIndex is good enough, and I should stop worrying about it. If so, we can just review and merge this, test it some more in staging, and deploy it as per normal.

But the whole thing feels fragile, and I'm missing something, and I don't like it.

Anyway, in theory, this fixes #12281.

…mits

I couldn't figure out how to get this to work with `diesel_async` pipelining: no matter what combination of ways to capture the connection I could cook up, something always broke. So, instead, we'll get a shittier, harder to use API: you have to create the `GitIndexSyncQueueItem` record explicitly, then call `enqueue` normally on the `SyncToGitIndex` unit struct. Not ideal. The other option here would be to add a new function or meta-job that does all the plumbing for both indices.

Turbo87 · 2025-12-25T08:41:32Z

This part is indeed a bit tricky. From what I can tell, this is caused by us using the query pipelining mode of Diesel (aka try_join!() with multiple queries inside). The reason this works for the regular BackgroundJob::enqueue() is that we are explicitly not using an async fn and instead return a very limited boxed future instead. This was tricky to get right in the first place, so I'm not surprised it's causing issues like this. If you move the SyncToGitIndex::enqueue_crate() call outside of the try_join!() and await it then it should work, but with the tradeoff of losing the pipelining behavior.

The alternative AFAICT would be to rewrite GitIndexSyncQueueItem::queue() similar to BackgroundJob::enqueue() so that it supports pipelining too. Afterwards it should hopefully be possible to use try_join!() inside SyncToGitIndex::enqueue_crate() to use pipelining there as well.

It's a bit unfortunate that Diesel is making this so hard due to the fact that it requires mutable connections, which fundamentally conflicts with sharing them between multiple queries, but fixing this is obviously not trivial 😅

syphar · 2025-12-26T08:01:17Z

crates/crates_io_database/src/models/git_index_sync_queue_fetch_batch.sql

+        LIMIT
+            $1
+    )
+DELETE FROM git_index_sync_queue USING batch


Just a note:

using a postgres table as a queue like this has one problem:

if our handler code fails, the entries are still removed from the queue.
Also, multiple workers arent supported.

If these things are an issue, you could do it the same way we do it in docs.rs:

start a transaction

SELECT [...] LIMIT 100 FOR UPDATE SKIP LOCKED

do your work

DELETE the entries

commit the transaction

Through this,

other workers just skip over these locked records, and

and in case of an error, your transaction is rolled back, and the records are unlocked again.

LawnGnome added 6 commits December 18, 2025 13:46

WIP.

8ced0d0

Merge remote-tracking branch 'upstream/main' into batch-git-index-com…

393c809

…mits

Use pubtime correctly.

6fc7ac9

Don't bother exporting NewGitIndexSyncQueueItem.

c2e722a

Add doc.

252b183

LawnGnome requested a review from a team December 24, 2025 23:40

LawnGnome added C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works A-backend ⚙️ rust labels Dec 24, 2025

syphar reviewed Dec 26, 2025

View reviewed changes

Turbo87 removed the rust label Dec 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Git index commit batching #12593

Add Git index commit batching #12593

Uh oh!

LawnGnome commented Dec 24, 2025

Uh oh!

Turbo87 commented Dec 25, 2025

Uh oh!

syphar Dec 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Git index commit batching #12593

Are you sure you want to change the base?

Add Git index commit batching #12593

Uh oh!

Conversation

LawnGnome commented Dec 24, 2025

Uh oh!

Turbo87 commented Dec 25, 2025

Uh oh!

syphar Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

syphar Dec 26, 2025 •

edited

Loading