Move event trimming from EventPersister to docket-based db_vacuum service#20811
Open
tom21100227 wants to merge 19 commits intoPrefectHQ:mainfrom
Open
Move event trimming from EventPersister to docket-based db_vacuum service#20811tom21100227 wants to merge 19 commits intoPrefectHQ:mainfrom
EventPersister to docket-based db_vacuum service#20811tom21100227 wants to merge 19 commits intoPrefectHQ:mainfrom
Conversation
Verifies that a failure in one cleanup step (e.g., orphaned logs) does not prevent subsequent steps (e.g., old flow run deletion) from running. With the help of Claude.
With the help of Claude.
The CI settings test validates that all setting env vars are registered in the SUPPORTED_SETTINGS dict. Our new db_vacuum settings were missing. With the help of Claude.
SecondsTimeDelta converts the raw int (172800) to timedelta(days=2), so the test needs an explicit expected_value. With the help of Claude.
New settings don't need validation_alias since there are no legacy env var names to support. The build_settings_config() prefix handles env var resolution automatically. With the help of Claude.
Replace monolithic vacuum_old_resources() with a finder perpetual service (schedule_vacuum_tasks) that enqueues 4 independent docket tasks: vacuum_orphaned_logs, vacuum_orphaned_artifacts, vacuum_stale_artifact_collections, and vacuum_old_flow_runs. This follows the established pattern in cancellation_cleanup.py, giving per-task error isolation and independent retries via docket. With the help of Claude.
Call docket task functions directly with db=provide_database_interface() matching the test_cancellation_cleanup.py pattern. Remove error isolation test class since isolation is now inherent via independent docket tasks. With the help of Claude.
Prevents duplicate task accumulation when cleanup overlaps with the next scheduling cycle. Also corrects misleading docstring about execution order. With the help of Claude.
Move event retention cleanup from EventPersister.trim() into the db_vacuum service as two new docket tasks: - vacuum_heartbeat_events: aggressively prunes prefect.flow-run.heartbeat events with a configurable retention period (default 1 day), using min(heartbeat_retention, events_retention) to respect existing PREFECT_EVENTS_RETENTION_PERIOD settings. - vacuum_old_events: deletes all events and event resources past the general events retention period, replacing EventPersister.trim(). Also registers both tasks in background_workers.task_functions and adds the PREFECT_SERVER_SERVICES_DB_VACUUM_HEARTBEAT_EVENTS_RETENTION_PERIOD setting. Closes PrefectHQ#20728 With the help of Claude.
Split the vacuum scheduler into two perpetual services: - schedule_vacuum_tasks: flow runs & orphaned resources (disabled by default via PREFECT_SERVER_SERVICES_DB_VACUUM_ENABLED, unchanged) - schedule_event_vacuum_tasks: events & heartbeat events (enabled by default via PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_ENABLED) This ensures event retention cleanup continues to work in default deployments after removing EventPersister.trim(), while keeping the destructive flow-run vacuum opt-in. With the help of Claude.
# Conflicts: # src/prefect/server/api/background_workers.py # src/prefect/server/services/db_vacuum.py # src/prefect/settings/models/server/services.py # tests/server/services/test_db_vacuum.py # tests/test_settings.py
- Add separate events_loop_seconds and events_batch_size settings so event cleanup tuning is independent from flow-run vacuum tuning - Extract HEARTBEAT_EVENT constant for the hardcoded event name - Update module docstring to reflect the two perpetual services - Add perpetual service registration tests for schedule_event_vacuum_tasks - Update heartbeat retention description to note PREFECT_EVENTS_RETENTION_PERIOD cap With the help of Claude.
Operators who disabled the event persister (PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED=false) should not see unexpected event trimming on upgrade. The event vacuum enabled_getter now requires both events_enabled AND event_persister.enabled to be true. With the help of Claude.
- Mark schedule_event_vacuum_tasks with run_in_ephemeral=True so ephemeral servers retain event cleanup (EventPersister.trim() ran in ephemeral mode via RunInEphemeralServers). - Remove dead batch_delete() function and batch_size_delete setting from event_persister (no longer used after trim removal). - Update module docstring to document event_persister.enabled gating. With the help of Claude.
Map the legacy PREFECT_SERVER_SERVICES_EVENT_PERSISTER_BATCH_SIZE_DELETE env var to db_vacuum.events_batch_size via validation_alias so existing operator configs continue to work after the trim migration. With the help of Claude.
273400c to
416ae06
Compare
With the help of Claude.
Contributor
Author
|
sorry guys I kept triggering more github actions :/ I thought marking it draft stops those actions from running |
- events_loop_seconds: 3600 -> 900 (matches old trim_every=15m) - heartbeat_events_retention_period: 1 day -> 7 days (matches PREFECT_EVENTS_RETENTION_PERIOD default so no events are pruned more aggressively than before out of the box) With the help of Claude.
EventPersister to docket-based db_vacuum service
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
closes #20728
This PR moves event retention cleanup from
EventPersister.trim()into the docket-baseddb_vacuumservice, and adds heartbeat-specific pruning to address unbounded DB growth from high-volume heartbeat events.Summary
trim(),trim_periodically(), and related code fromEventPersister. Event cleanup is now handled by two new docket tasks (vacuum_old_events,vacuum_heartbeat_events) scheduled by a newschedule_event_vacuum_tasksperpetual service.vacuum_heartbeat_eventstask with a configurable retention period (default 7 days, same as general events). Operators experiencing high-volume heartbeat DB bloat (e.g. 80+ workers) can lower this to prune heartbeats more aggressively.PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_ENABLED(default true) ANDPREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED(default true), so operators who disabled event processing won't see unexpected trimming on upgrade.Migrating from
EventPersister.trim()to docketUnlike #20728 which adds a new optional feature, changes in this PR are replacing an existing feature (that defaults to be on). So I am taking extra precaution in ensuring that this migration does not break intended behavior. With some code review from codex I identified a few settings/behaviors that I need to take care of:
PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED(legacy alias:PREFECT_API_SERVICES_EVENT_PERSISTER_ENABLED)PREFECT_SERVER_EVENTS_RETENTION_PERIODPREFECT_SERVER_SERVICES_EVENT_PERSISTER_BATCH_SIZE_DELETEPREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLEDPREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_ENABLED(defaults to be enabled)EventPersisterstill exists so it still controls that, butDB_VACUUM_EVENTSrespects this setting: if event persister is off, then DB_VACUUM_EVENTS would not be triggered. Both settings must be true for DB_VACUUM_EVENTS to trigger.PREFECT_SERVER_EVENTS_RETENTION_PERIODPREFECT_SERVER_SERVICES_EVENT_PERSISTER_BATCH_SIZE_DELETEPREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_BATCH_SIZEDB_VACUUM_EVENTS_BATCH_SIZEis preferred when both are set; otherwise the legacyEVENT_PERSISTER_BATCH_SIZE_DELETEis used as a fallback viavalidation_alias.trim_every=15m)PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_LOOP_SECONDS900(15 minutes) to match the previous hardcoded cadence.PREFECT_SERVER_SERVICES_DB_VACUUM_HEARTBEAT_EVENTS_RETENTION_PERIODPREFECT_EVENTS_RETENTION_PERIOD) so no events are pruned more aggressively than before out of the box. Capped byPREFECT_EVENTS_RETENTION_PERIODif that is shorter. Operators can lower this to address heartbeat bloat.Backward compatibility preserved
event_persister.enabledstill gates event vacuum — if event persister is off, event vacuum is off.PREFECT_EVENTS_RETENTION_PERIODremains the primary retention knob. Heartbeat retention defaults to the same 7 days and usesmin(heartbeat_retention, events_retention)so heartbeats never outlive general events.trim_everydefault. Now configurable viaPREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_LOOP_SECONDS.PREFECT_SERVER_SERVICES_EVENT_PERSISTER_BATCH_SIZE_DELETEis now a fallback alias forPREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_BATCH_SIZE. Existing configs continue to work.Intentional behavior changes
vacuum_heartbeat_eventstask with a separate configurable retention (default 7 days, same as general events). Operators can lower this to prune high-volume heartbeats more aggressively.Detailed migration notes
What changed
PREFECT_EVENTS_RETENTION_PERIODPREFECT_SERVER_SERVICES_EVENT_PERSISTER_BATCH_SIZE_DELETEPREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_BATCH_SIZE(default 10,000); legacy env var accepted as fallbackrun_in_ephemeral=TrueLegacy settings
PREFECT_SERVER_SERVICES_EVENT_PERSISTER_BATCH_SIZE_DELETE— now a fallback alias forPREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_BATCH_SIZE. Thebatch_delete()function in event_persister.py was removed (replaced by_batch_delete()in db_vacuum.py).New settings
PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_ENABLEDtruePREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_LOOP_SECONDS900(15 min)PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_BATCH_SIZE10000PREFECT_SERVER_SERVICES_DB_VACUUM_HEARTBEAT_EVENTS_RETENTION_PERIOD604800(7 days)PREFECT_EVENTS_RETENTION_PERIODTests
Removed one test related to
EventPersister.trim():test_event_persister.py::test_trims_messages_periodicallyAdded 8 new tests for event vacuum in
test_db_vacuum.py, 4 new tests intest_perpetual_services.py(registration, ephemeral mode, enable/disable gating), and 2 legacy alias tests intest_settings.py.uv run pytest tests/server/services/test_db_vacuum.py -v— 31 tests covering all vacuum tasksuv run pytest tests/server/services/test_perpetual_services.py -v— 19 tests covering registration, ephemeral mode, and enable/disable gatinguv run pytest tests/events/server/storage/test_event_persister.py -v— event persister tests still pass with trim removeduv run pytest tests/test_settings.py -k SUPPORTED_SETTINGS -v— new settings validated