Skip to content

Comments

Move event trimming from EventPersister to docket-based db_vacuum service#20811

Open
tom21100227 wants to merge 19 commits intoPrefectHQ:mainfrom
tom21100227:db-vacuum-service-v2
Open

Move event trimming from EventPersister to docket-based db_vacuum service#20811
tom21100227 wants to merge 19 commits intoPrefectHQ:mainfrom
tom21100227:db-vacuum-service-v2

Conversation

@tom21100227
Copy link
Contributor

@tom21100227 tom21100227 commented Feb 23, 2026

closes #20728

This PR moves event retention cleanup from EventPersister.trim() into the docket-based db_vacuum service, and adds heartbeat-specific pruning to address unbounded DB growth from high-volume heartbeat events.

Summary

  • Migrate event trimming to docket: Remove trim(), trim_periodically(), and related code from EventPersister. Event cleanup is now handled by two new docket tasks (vacuum_old_events, vacuum_heartbeat_events) scheduled by a new schedule_event_vacuum_tasks perpetual service.
  • Add heartbeat-specific pruning: New vacuum_heartbeat_events task with a configurable retention period (default 7 days, same as general events). Operators experiencing high-volume heartbeat DB bloat (e.g. 80+ workers) can lower this to prune heartbeats more aggressively.
  • Safe migration: Event vacuum is gated on both PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_ENABLED (default true) AND PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED (default true), so operators who disabled event processing won't see unexpected trimming on upgrade.

Migrating from EventPersister.trim() to docket

Unlike #20728 which adds a new optional feature, changes in this PR are replacing an existing feature (that defaults to be on). So I am taking extra precaution in ensuring that this migration does not break intended behavior. With some code review from codex I identified a few settings/behaviors that I need to take care of:

  1. PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED (legacy alias: PREFECT_API_SERVICES_EVENT_PERSISTER_ENABLED)
  2. PREFECT_SERVER_EVENTS_RETENTION_PERIOD
  3. PREFECT_SERVER_SERVICES_EVENT_PERSISTER_BATCH_SIZE_DELETE
Old Setting New Setting How I ensure migration consistency
PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED (In addition) PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_ENABLED (defaults to be enabled) EventPersister still exists so it still controls that, but DB_VACUUM_EVENTS respects this setting: if event persister is off, then DB_VACUUM_EVENTS would not be triggered. Both settings must be true for DB_VACUUM_EVENTS to trigger.
PREFECT_SERVER_EVENTS_RETENTION_PERIOD N/A Global retention remains authoritative. The new docket service still respects this.
PREFECT_SERVER_SERVICES_EVENT_PERSISTER_BATCH_SIZE_DELETE PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_BATCH_SIZE Both default to 10,000. DB_VACUUM_EVENTS_BATCH_SIZE is preferred when both are set; otherwise the legacy EVENT_PERSISTER_BATCH_SIZE_DELETE is used as a fallback via validation_alias.
N/A (hardcoded trim_every=15m) PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_LOOP_SECONDS Now a configurable option. Defaults to 900 (15 minutes) to match the previous hardcoded cadence.
N/A (no heartbeat-specific pruning) PREFECT_SERVER_SERVICES_DB_VACUUM_HEARTBEAT_EVENTS_RETENTION_PERIOD Defaults to 7 days (matching PREFECT_EVENTS_RETENTION_PERIOD) so no events are pruned more aggressively than before out of the box. Capped by PREFECT_EVENTS_RETENTION_PERIOD if that is shorter. Operators can lower this to address heartbeat bloat.

Backward compatibility preserved

  • Operator toggles: event_persister.enabled still gates event vacuum — if event persister is off, event vacuum is off.
  • Retention: PREFECT_EVENTS_RETENTION_PERIOD remains the primary retention knob. Heartbeat retention defaults to the same 7 days and uses min(heartbeat_retention, events_retention) so heartbeats never outlive general events.
  • Trim cadence: Defaults to 15 minutes, matching the previous trim_every default. Now configurable via PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_LOOP_SECONDS.
  • Legacy batch size env var: PREFECT_SERVER_SERVICES_EVENT_PERSISTER_BATCH_SIZE_DELETE is now a fallback alias for PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_BATCH_SIZE. Existing configs continue to work.

Intentional behavior changes

  • Heartbeat-specific retention: New vacuum_heartbeat_events task with a separate configurable retention (default 7 days, same as general events). Operators can lower this to prune high-volume heartbeats more aggressively.
  • Execution model: Moved from in-process asyncio task to docket-scheduled tasks with per-batch transactions and docket retry semantics.
Detailed migration notes

What changed

Aspect Before (EventPersister.trim) After (db_vacuum)
Scheduling In-process asyncio task, every 15min Docket perpetual service, every 15min (configurable)
Retention setting PREFECT_EVENTS_RETENTION_PERIOD Same setting, still respected
Batch size PREFECT_SERVER_SERVICES_EVENT_PERSISTER_BATCH_SIZE_DELETE PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_BATCH_SIZE (default 10,000); legacy env var accepted as fallback
Heartbeat cleanup Same retention as all events Separate configurable retention (default 7 days, same as general events)
Ephemeral mode Ran via RunInEphemeralServers Runs via run_in_ephemeral=True
Error handling try/except in asyncio loop Docket retry semantics + per-batch transactions

Legacy settings

  • PREFECT_SERVER_SERVICES_EVENT_PERSISTER_BATCH_SIZE_DELETE — now a fallback alias for PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_BATCH_SIZE. The batch_delete() function in event_persister.py was removed (replaced by _batch_delete() in db_vacuum.py).

New settings

Setting Default Description
PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_ENABLED true Master on/off for event vacuum
PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_LOOP_SECONDS 900 (15 min) Schedule interval, matching previous trim cadence
PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_BATCH_SIZE 10000 Records per transaction
PREFECT_SERVER_SERVICES_DB_VACUUM_HEARTBEAT_EVENTS_RETENTION_PERIOD 604800 (7 days) Heartbeat retention, capped by PREFECT_EVENTS_RETENTION_PERIOD

Tests

Removed one test related to EventPersister.trim():

  • test_event_persister.py::test_trims_messages_periodically

Added 8 new tests for event vacuum in test_db_vacuum.py, 4 new tests in test_perpetual_services.py (registration, ephemeral mode, enable/disable gating), and 2 legacy alias tests in test_settings.py.

  • uv run pytest tests/server/services/test_db_vacuum.py -v — 31 tests covering all vacuum tasks
  • uv run pytest tests/server/services/test_perpetual_services.py -v — 19 tests covering registration, ephemeral mode, and enable/disable gating
  • uv run pytest tests/events/server/storage/test_event_persister.py -v — event persister tests still pass with trim removed
  • uv run pytest tests/test_settings.py -k SUPPORTED_SETTINGS -v — new settings validated

Verifies that a failure in one cleanup step (e.g., orphaned logs) does
not prevent subsequent steps (e.g., old flow run deletion) from running.

With the help of Claude.
The CI settings test validates that all setting env vars are registered
in the SUPPORTED_SETTINGS dict. Our new db_vacuum settings were missing.

With the help of Claude.
SecondsTimeDelta converts the raw int (172800) to timedelta(days=2),
so the test needs an explicit expected_value.

With the help of Claude.
New settings don't need validation_alias since there are no legacy
env var names to support. The build_settings_config() prefix handles
env var resolution automatically.

With the help of Claude.
Replace monolithic vacuum_old_resources() with a finder perpetual
service (schedule_vacuum_tasks) that enqueues 4 independent docket
tasks: vacuum_orphaned_logs, vacuum_orphaned_artifacts,
vacuum_stale_artifact_collections, and vacuum_old_flow_runs.

This follows the established pattern in cancellation_cleanup.py,
giving per-task error isolation and independent retries via docket.

With the help of Claude.
Call docket task functions directly with db=provide_database_interface()
matching the test_cancellation_cleanup.py pattern. Remove error
isolation test class since isolation is now inherent via independent
docket tasks.

With the help of Claude.
Prevents duplicate task accumulation when cleanup overlaps with the next
scheduling cycle. Also corrects misleading docstring about execution order.

With the help of Claude.
Move event retention cleanup from EventPersister.trim() into the
db_vacuum service as two new docket tasks:

- vacuum_heartbeat_events: aggressively prunes prefect.flow-run.heartbeat
  events with a configurable retention period (default 1 day), using
  min(heartbeat_retention, events_retention) to respect existing
  PREFECT_EVENTS_RETENTION_PERIOD settings.
- vacuum_old_events: deletes all events and event resources past the
  general events retention period, replacing EventPersister.trim().

Also registers both tasks in background_workers.task_functions and adds
the PREFECT_SERVER_SERVICES_DB_VACUUM_HEARTBEAT_EVENTS_RETENTION_PERIOD
setting.

Closes PrefectHQ#20728

With the help of Claude.
Split the vacuum scheduler into two perpetual services:

- schedule_vacuum_tasks: flow runs & orphaned resources (disabled by
  default via PREFECT_SERVER_SERVICES_DB_VACUUM_ENABLED, unchanged)
- schedule_event_vacuum_tasks: events & heartbeat events (enabled by
  default via PREFECT_SERVER_SERVICES_DB_VACUUM_EVENTS_ENABLED)

This ensures event retention cleanup continues to work in default
deployments after removing EventPersister.trim(), while keeping the
destructive flow-run vacuum opt-in.

With the help of Claude.
# Conflicts:
#	src/prefect/server/api/background_workers.py
#	src/prefect/server/services/db_vacuum.py
#	src/prefect/settings/models/server/services.py
#	tests/server/services/test_db_vacuum.py
#	tests/test_settings.py
- Add separate events_loop_seconds and events_batch_size settings so
  event cleanup tuning is independent from flow-run vacuum tuning
- Extract HEARTBEAT_EVENT constant for the hardcoded event name
- Update module docstring to reflect the two perpetual services
- Add perpetual service registration tests for schedule_event_vacuum_tasks
- Update heartbeat retention description to note PREFECT_EVENTS_RETENTION_PERIOD cap

With the help of Claude.
Operators who disabled the event persister
(PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED=false) should not see
unexpected event trimming on upgrade. The event vacuum enabled_getter
now requires both events_enabled AND event_persister.enabled to be true.

With the help of Claude.
- Mark schedule_event_vacuum_tasks with run_in_ephemeral=True so
  ephemeral servers retain event cleanup (EventPersister.trim() ran in
  ephemeral mode via RunInEphemeralServers).
- Remove dead batch_delete() function and batch_size_delete setting
  from event_persister (no longer used after trim removal).
- Update module docstring to document event_persister.enabled gating.

With the help of Claude.
@github-actions github-actions bot added the enhancement An improvement of an existing feature label Feb 23, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Feb 23, 2026

Merging this PR will not alter performance

✅ 2 untouched benchmarks


Comparing tom21100227:db-vacuum-service-v2 (4de499f) with main (33c6c4c)

Open in CodSpeed

Map the legacy PREFECT_SERVER_SERVICES_EVENT_PERSISTER_BATCH_SIZE_DELETE
env var to db_vacuum.events_batch_size via validation_alias so existing
operator configs continue to work after the trim migration.

With the help of Claude.
@tom21100227
Copy link
Contributor Author

sorry guys I kept triggering more github actions :/ I thought marking it draft stops those actions from running

- events_loop_seconds: 3600 -> 900 (matches old trim_every=15m)
- heartbeat_events_retention_period: 1 day -> 7 days (matches
  PREFECT_EVENTS_RETENTION_PERIOD default so no events are pruned
  more aggressively than before out of the box)

With the help of Claude.
@tom21100227 tom21100227 marked this pull request as ready for review February 24, 2026 16:53
@tom21100227 tom21100227 changed the title Move event trimming from EventPersister to docket-based db_vacuum service Move event trimming from EventPersister to docket-based db_vacuum service Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement An improvement of an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

auto clean-up for flow-run heartbeat in prefect internal DB

1 participant