-
Notifications
You must be signed in to change notification settings - Fork 2
fix: Wait for MinIO Server before creating bucket and catch errors #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Wait for the MinIO Server to be healthy before attempting to create a bucket as part of the 'model-bucket-init' service. This is necessary because the MinIO Server may not be ready to accept requests when the service starts, resulting in errors like the following: ``` Unable to initialize new alias from the provided credentials. Server not initialized, please try again. ``` Signed-off-by: Phoevos Kalemkeris <[email protected]>
baixiac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to have minio liveness as a dependency. As discussed, please add the fix to the init command which currently swallows the error code and suppresses the container restarting. Even if the added liveness check passes, the command may still fail silently to intermittent issues such as network instability or misconfiguration on minio. Exposing the error code ensures the retries on bucket creation until it succeeds.
Chain the commands involved in creating a MinIO bucket run as part of the 'model-bucket-init' service in the 'docker-compose-mlflow.yml' file to correctly catch errors and trigger service restarts. Previously, if the first command (i.e. adding a MinIO host) failed, the second one would still run, overwriting the error code and preventing the service from restarting. Using the '&&' operator to chain the commands ensures that the second command will only run if the first one is successful and that the error code is correctly propagated, allowing Docker to retry in the event of a failure. Note that the MinIO client commands can fail due to a variety of reasons, such as network issues (some of which might be transient), the MinIO Server not being ready, or incorrect credentials. Signed-off-by: Phoevos Kalemkeris <[email protected]>
|
@baixiac, I've added a commit to chain the commands and properly propagate errors. Note that the 2 fixes address different kinds of issues. The first one, waiting until the MinIO service is ready, is something that should be there when services depend on each other's functionality, regardless of the specific logic running in the dependent. The second one is specific to the bash script currently running there and even though it happens to fix the issue that triggered this discussion in the first place, it alters the service's behaviour, especially when it comes to non-transient failures. For instance, if incorrect credentials are provided, we wouldn't really want to keep retrying but rather fail gracefully. This is the reason why I suggested that it should be part of a different discussion and PR, rather than squashed together with a shallow fix to a broader issue. That said, I'm eager to see this merged, only adding the context for future reference. Perhaps we could add a simple change like the following to arbitrarily limit the number of restarts or a restart policy if we also want to add a delay between attempts. restart: on-failure:10 |
baixiac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, if incorrect credentials are provided, we wouldn't really want to keep retrying but rather fail gracefully.
That means more logic needs to be added to handle different failing cases. Besides, the user may or may not notice if the container/job ran once and exited before carrying on inspecting its logs.
The fix looks good to me, and you have carte blanche to add retry with max attempts and backoff jitter.
Wait for the MinIO Server to be healthy before attempting to create a bucket as part of the
model-bucket-initservice. This is necessary because the MinIO Server may not be ready to accept requests when the service starts (which is the only thing the defaultdepends_oncondition guarantees), resulting in errors like the following:The above can be seen in failed integration tests on the Gateway side, while it looks like the change introduced here resolves the issue (subsequent CI runs using this commit to set up CMS succeed).
On top of that, chain the commands involved in creating a MinIO bucket to correctly catch errors and trigger service restarts. Previously, if the first command (i.e. adding a MinIO host) failed, the second one would still run, overwriting the error code and preventing the service from restarting. Using the
&&operator to chain the commands ensuresthat the second command will only run if the first one is successful and that the error code is correctly propagated, allowing Docker to retry in the event of a failure. Note that the MinIO client commands can fail due to a variety of reasons, such as network issues (some of which might be transient), the MinIO Server not being ready, or incorrect credentials.