You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add worker that deletes expired entries from the documents log (#23527)
Added a `delete_documents` worker that retrieves entries before the valid retention window and deletes them. The logic looks pretty similar to the index deletion worker. I added rate limiting to prevent from overloading the database. I estimate that
This is off by default. I can test this out by turning it on for a single instance and looking at the metrics. For any instance with > 10000 deleted entries, I calculated that it would make ~300 calls to the database:
- delete batch size is 10000
- each call to the db in made in chunks of ~100 (128) operations
- 100 queries to get `prev_revs`
- 100 queries to get documents to delete
- 100 calls to delete documents
Not quite sure the limitations of the the database, so please let me know what's reasonable. I initially set rate limiting to only deleting 1 batch per minute per instance.
### Safeguards
There are 2 knobs associated with this change:
- `RETENTION_DOCUMENT_DELETES_ENABLED` allows the deletion logic to run
- `DOCUMENT_RETENTION_DRY_RUN` controls whether or not the deletion queries are made to the db
*this could be a bit confusing, so let me know*
There is an `anyhow::ensure()` that all documents that we delete are out of the retention window.
GitOrigin-RevId: 562b152ff7ed3ae9a68c28c140e7c214754bd6ec
0 commit comments