BBR backup of Concourse fails because the code does not include lock scripts for the PostgresDB #6035
Unanswered
Hermen-Nicolau
asked this question in
Help & Support
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
When using Bosh Backup and Restore (BBR) to backup a Concourse deployment the Backup fails with the following error message:
As of right now, the Concourse release does not include a lock on the Postgres DB as part of its code. Therefore, the backup will run concurrently with other database operations. Since new tables are created/destroyed when setting/deleting pipelines, it's possible to get into this state where the backup and the DB are out of sync and the backup fails.
The request is to enhance the product by adding a pre-backup-lock script to lock the system before the backup starts, and also a post-backup-unlock script to unlock the system when the backup is done.
This is something that can only be changed at the Concourse Release level and not the BBR level. There are detailed docs on the contract here: https://docs.cloudfoundry.org/bbr/index.html
A few notes on this behavior from BBR R&D:
Concourse uses the pg_dump utility for backing up a PostgreSQL database, which per definition makes consistent backups even if the database is being used concurrently. However, Concourse doesn’t perform table specification (evidence is that https://github.com/concourse/concourse-bosh-release/blob/master/jobs/bbr-atcdb/templates/config.json.erb doesn’t configure the tables property).
This may be occurring as a result of how Concourse uses the DB. This comment in pg_dump (https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/pg_dump.c#L14-L23) leads me to believe that Concourse does frequent DDL like actions (evidenced by the name of the table in question public.build_event_id_seq_4327391). Because of this Concourse ought to implement locking or change how they use an RDMS. The table name leads me to believe that DDL activity is happening often
Beta Was this translation helpful? Give feedback.
All reactions