-
Notifications
You must be signed in to change notification settings - Fork 2.1k
QEMU: Correct graceful_shutdown logic #27316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
db2daa4 to
ecea7e9
Compare
|
Hi @olljanat! Thanks for this contribution. I'll take a swing at reproducing this today. |
mismithhisler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great contribution! I left a couple comments, let me know what you think.
ecea7e9 to
860331c
Compare
|
Last thing, do you mind adding a changelog via |
e9eea0e to
7b668ae
Compare
|
Sure, changelog is included now |
drivers/qemu/driver.go
Outdated
| d.logger.Error("graceful shutdown", "pid", handle.pid, "timeout after", timeout) | ||
| break out | ||
| case <-ticker.C: | ||
| handle, _ = d.tasks.Get(taskID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this line because we already have the task handle from line 718?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yes if that is actual handle where status updates without need to fetch it again. Updated (but not tested).
7b668ae to
844cded
Compare
mismithhisler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for this contribution!
Description
There seems to be several bugs in QEMU driver's graceful_shutdown feature which most likely has been there already as part of #4800 (at least description says that it was not tested). Those bugs are:
-monitorinstead of-qmpwhich is read-only.Testing & Reproduction steps
If you check job error log, it contains message like
qemu-system-x86_64: terminating on signal 2 from pid 63315 (/usr/bin/nomad)after you stop job, even when graceful_shutdown is configured.That why you need use
kill_signal = "SIGUSR1"to really see that guest shutdown does not work and VM process gets killed by Nomad.You can also try Python script like this to see that socket created by
-monitorflag is read only.and that it only works after you enable another with parameters like:
Here is also working Golang client:
but unlike with Python, it reading QEMU responses seems to be needed, other why commands are not effective.
Contributor Checklist
changelog entry using the
make clcommand.ensure regressions will be caught.
and job configuration, please update the Nomad product documentation, which is stored in the
web-unified-docsrepo. Refer to theweb-unified-docscontributor guide for docs guidelines.Please also consider whether the change requires notes within the upgrade
guide. If you would like help with the docs, tag the
nomad-docsteam in this PR.Reviewer Checklist
backporting document.
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
within the public repository.
Changes to Security Controls
Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.