-
Couldn't load subscription status.
- Fork 796
Move DB initialization scripts for postgres and redis into service files. #967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
518415f to
136aefe
Compare
templates/redis.template.yml
Outdated
| exec 2>&1 | ||
| if [ ! -d /shared/redis_data ]; then | ||
| install -d -m 0755 -o redis -g redis /shared/redis_data | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test here would skip past if the directory exists but the ownership or permission was changed externally. Is that what we want?
Running install will still return success if the directory is already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was attempting to save work at boot, but it does seem safe enough to run install regardless, will remove the if
templates/postgres.template.yml
Outdated
| rm /root/install_postgres | ||
| fi | ||
| if [ "$CREATE_DB_ON_BOOT" = "1" ]; then | ||
| /usr/local/bin/create_db& |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there &? I don't think we want that to happen in the background.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the way create_db is called such that it follows how docker postgres resolves a very similar problem with ensuring a postgres service is started before running scripts against it.
templates/postgres.template.yml
Outdated
| chmod: +x | ||
| contents: | | ||
| #!/bin/bash | ||
| # wait for postgres to start up... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am uncomfortable putting this check inside the create_db script. Whatever is calling it -- I think /etc/service/postgres/run -- should ensure the database is available first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought to move the check+sleep into the file itself since the sleep/check/wait is needed to be done both on launcher bootstrap (if doing a full, environment-bound build using launcher), and when booting running directly from launcher build, with CREATE_DB_ON_BOOT=1.
But I can move the check/sleep back out to both places if that makes things more readable/explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted the sleep on the -exec path, and in the service file, started the DB with -w start flag, which will not background the postgres service until it is up, which is exactly what we want. This also removes the check from the create_db script as well.
templates/postgres.template.yml
Outdated
| # wait for postgres to start up... | ||
| for i in {1..5}; do | ||
| su postgres -c 'pg_isready -q' && break | ||
| sleep 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth using the iterator for increased backoff --sleep $i. Do we know how long the database normally takes to start? How long are we willing to wait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From testing 1 second was plenty and allowed the process to feel a bit snappier by shaving off a few seconds, but I didn't want to assume all systems could boot within 1 second.
Previously wait was hardcoded to 5 seconds, so I used that as an upper bound to not leave older hardware out if that length mattered for other systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the need for the check on start in the service file
templates/postgres.template.yml
Outdated
| contents: | | ||
| #!/bin/bash | ||
| # wait for postgres to start up... | ||
| for i in {1..5}; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we try five times and the database is still not responding? This will just continue ahead. We should issue an explicit error and abort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the previous script crashed against the database not being up and errored there, but it does make much more sense to explicitly throw a "cannot connect" if the timeout is reached. Will adjust 👍
…les. This resolves a race condition with unconfigured images attempting to bring up DBs for the first time. This does not affect fully bootstrapped images. Currently, all jobs start at boot - this includes postgres. Issue with the current is - postgres starts and adds the corresponding .s/.pid files to /var/run/postgres. Simultaneously, the unicorn job gets started, checks to see if postgres is running (it is already at this point from boot), and runs install_postgres. Inside the install_postgres script, we mount the shared postgres folder and remove .s/.pid files -- after postgres has already been started. In this case, we remove the (in-use) .s and .pid files. Subsequent unicorn tasks fail, erroring out the service and forcing it into a restart loop. Since postgres never restarts, it never regenerates the .s/.pid files, and unicorn can never run successfully. This proposal moves install_postgres into the postgres job file, eliminating the race condition. Since they are part of the same service, install_postgres will always run before starting postgres - it will no longer be able to remove valid .s and .pid files. Redis has a similar race condition with the creation of its data folder. This isn't as disastrous as the redis service restarts until the folder exists from unicorn run, but it provides better reasoning about the running services. Add early exit from unicorn boot scripts to properly retry migrate as well. Use pg_isready to check if pg is ready directly in create_db. Merge the ready check into create_db. Run create_db in a subshell on postgres job start, rather than in unicorn script. remove postgres-config call
Update DB password on boot in 2 container setup from env if exists
Fixes permissions on the directory if mis-set.
Use pg_ctl with -w start to ensure postgres is started. Allows create_db to run in the foreground on start for CREATE_DB_ON_BOOT. Take inspiration to how docker-library postgres does similar: https://github.com/docker-library/postgres/blob/889f9447cd2dfe21cccfbe9bb7945e3b037e02d8/15/bullseye/docker-entrypoint.sh#L294-L316
d1f6d4b to
d46d5b3
Compare
This resolves a race condition with unconfigured images attempting to bring up DBs for the first time. This does not affect fully bootstrapped images.
Currently, all jobs start at boot - this includes postgres.
Issue with the current is - postgres starts and adds the corresponding .s/.pid files to /var/run/postgres.
Simultaneously, the unicorn job gets started, checks to see if postgres is running (it is already at this point from boot), and runs install_postgres.
Inside the install_postgres script, we mount the shared postgres folder and remove .s/.pid files -- after postgres has already been started. In this case, we remove the (in-use) .s and .pid files.
Subsequent unicorn tasks fail, erroring out the service and forcing it into a restart loop. Since postgres never restarts, it never regenerates the .s/.pid files, and unicorn can never run successfully.
This proposal moves install_postgres into the postgres job file, eliminating the race condition. Since they are part of the same service, install_postgres will always run before starting postgres - it will no longer be able to remove valid .s and .pid files.
Redis has a similar race condition with the creation of its data folder. This isn't as disastrous as the redis service restarts until the folder exists from unicorn run, but it provides better reasoning about the running services.
Run create_db script if configured to directly from the postgres service.
Wait more smartly via polling pg_isready rather than a single long sleep.
Add early exit from unicorn boot scripts for more dependable service restarts - exiting early ensures all will restart.