Skip to content

Update collection stats (e.g. date ranges, counts, tags) in background job#3241

Open
tw4l wants to merge 14 commits intomainfrom
issue-3218-update-collections-async
Open

Update collection stats (e.g. date ranges, counts, tags) in background job#3241
tw4l wants to merge 14 commits intomainfrom
issue-3218-update-collections-async

Conversation

@tw4l
Copy link
Copy Markdown
Member

@tw4l tw4l commented Mar 30, 2026

Fixes #3218

This PR moves updating of collections after changes (e.g. items being added or removed) to a background job, to ensure that collection API requests remain quick.

Changes

  • New background job added to recalculate collection stats
  • The method to recalculate collection stats is tweaked slightly to make it more efficient, and to fix the most common tag count calculation
  • All instances where collection statistics would be re-created as part of an API method now kick off a background job instead of awaiting (there are a long-running processes such as org import where we still await instead)
  • Backend and nightly tests updated to account for the changes
  • Frontend collection detail now polls every 10 seconds to pick up updates after changes that kick off stats recalculation (e.g. adding or removing items) - this is a fairly naive/simple implementation but seems to be working pretty well in testing
  • A few places in the backend modules these changes touch where we were using asyncio.create_task have been updated so that they will not be garbage collected before they complete (see [Task]: Ensure asyncio tasks aren't garbage collected before they complete #3240 for more context and tracking of completing this across the rest of the backend)

Testing

  • Spin up a Browsertrix instance
  • Create a collection with some items
  • Add and remove items and then verify that the collection stats update not long after
  • Verify that background jobs have been created and marked as successful in the database via API

Nightly test run: https://github.com/webrecorder/browsertrix/actions/runs/23769472591

tw4l added 12 commits March 19, 2026 14:00
Wherever updating collection counts, tags, and dates would block API
respones, this commit moves those operations instead to an asyncio
task. It also ensures those tasks aren't garbage collected before they
are completed.

In addition, this moves updating counts, tags, and dates into a single
function update_collection_stats that is used uniformly, as it was
previously inconsistent whether the collection's date range was always
updated.
@tw4l tw4l requested review from emma-sg and ikreymer March 30, 2026 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Task]: Update collection counts, tags, and dates in background job

1 participant