Add pg_vaccumen — proactive vacuum maintenance tool#15
Open
xtpclark wants to merge 24 commits intoomniti-labs:masterfrom
Open
Add pg_vaccumen — proactive vacuum maintenance tool#15xtpclark wants to merge 24 commits intoomniti-labs:masterfrom
xtpclark wants to merge 24 commits intoomniti-labs:masterfrom
Conversation
Skip tables larger than N GB with --max-size to focus on smaller tables when a few monsters dominate the queue. Automatically detect and skip tables where autovacuum is already running (opt-out with --no-skip-autovacuum). Dry-run output now shows table sizes and [autovacuum running] annotations.
PostgreSQL may return timestamps with 1-5 fractional digits (e.g. .44841) but Python < 3.11 fromisoformat requires exactly 0, 3, or 6. Pad to 6 digits before parsing.
Anonymized walkthrough: triage a 245-table backlog with monster tables, lower threshold to find hidden work, vacuum small tables first with --max-size, then increase size limit in phases.
VACUUM on large tables can run for hours. If the connection role has a statement_timeout configured, it would cancel vacuum mid-operation. Always set explicitly at connection time (default 0 = no timeout).
The rollback + autocommit toggle resets session state. Set statement_timeout directly in autocommit mode right before VACUUM to guarantee it takes effect.
All Python CLI options now have corresponding Jenkins parameters.
Reorder sections so new users see installation, sample output, and the real-world scenario before hitting the locking behavior and threshold theory deep-dives. Update sample output to show new size column and autovacuum annotations. Add all Jenkins parameters to pipeline table.
- --check-bloat / --bloat-pct: dead tuple analysis via pg_stat_user_tables, reports tables with high dead/live ratio for pg_repack candidates - --workers N: parallel vacuum via ThreadPoolExecutor with Queue-based connection pool, hard cap at 8, auto-reduced by validate_workers() checking maintenance_work_mem and max_connections headroom - get_vacuum_activity(): detects both autovacuum workers AND manual VACUUM from other sessions, annotates dry-run output accordingly - README: parallel workers warnings, killing a running vacuum guide, bloat analysis docs, tuning guidance additions - Jenkinsfile: CHECK_BLOAT, BLOAT_PCT, WORKERS parameters
The previous Ctrl+C fix only covered the parallel (--workers) path. Single-worker vacuum now catches KeyboardInterrupt and prints a clean summary instead of a traceback.
Instead of relying on a stale snapshot taken before the vacuum loop, query pg_stat_activity per-table just before issuing VACUUM. This detects vacuums started by other pg_vaccumen instances or autovacuum workers that began after our initial check. Applies to both sequential and parallel (--workers) execution paths.
Every VACUUM operation (sequential or parallel) must acquire a numbered advisory lock slot before executing. All pg_vaccumen instances on the same database share the lock namespace, so --workers N becomes a global concurrency limit rather than per-instance. - VACUUM_LOCK_NAMESPACE (0x70675661) for advisory lock keys - acquire_vacuum_slot(): blocks with polling until a slot is free - release_vacuum_slot(): frees the slot after VACUUM completes - get_vacuum_slots_in_use(): reports slot status before execution - Locks auto-release on disconnect (crash-safe, no stale state)
Update README to reflect that --workers is a global concurrency limit enforced via PostgreSQL advisory locks across all pg_vaccumen instances on the same database.
The ThreadPoolExecutor context manager's __exit__ calls shutdown(wait=True) before the KeyboardInterrupt handler runs, blocking on the first Ctrl+C. Fix by managing the executor directly with shutdown(wait=False). Also suppress spurious "FAILED: connection socket closed" messages from workers whose connections are closed during shutdown, using a threading.Event flag.
Rewrite the backlog catch-up walkthrough based on actual production experience: save baseline first, use parallel workers, and document the vacuum_freeze_min_age floor that causes infinite re-vacuum at 25% threshold.
Document the minimum PostgreSQL grants (pg_monitor, pg_maintain, schema CREATE/USAGE) needed for a dedicated pg_vaccumen service account, with fallback notes for PostgreSQL < 16.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tools/manual_vacuum.shfor proactive vacuum maintenanceautovacuum_freeze_max_age, spreading work over nightly runs to avoid emergency anti-wraparound vacuum spikesFeatures
autovacuum_freeze_max_ageat runtime, uses percentages so it adapts when the setting changes--workers Nwith global concurrency control via advisory locks--max-size <GB>to skip oversized tables, prioritize smaller ones--check-bloatfor dead tuple ratio reporting viapg_stat_user_tablespg_monitor+pg_maintain— no superuser needed--host(Aurora AWS integrations optional via--cluster)Files
tools/pg_vaccumen/pg_vaccumen.py— Main scripttools/pg_vaccumen/Jenkinsfile— Jenkins pipelinetools/pg_vaccumen/requirements.txt— Dependencies (boto3, psycopg)tools/pg_vaccumen/README.md— Full documentationProduction Tested
Running nightly against Aurora PostgreSQL 17 clusters with 750M
autovacuum_freeze_max_age, 112M+ transactions/day, and 16TB tables. Successfully brought a cluster from 86.5% of freeze max down to 11% through proactive maintenance.