Local storage [4/4]: Use raw zvols#9593
Conversation
Request the new *raw* zvol type when creating disks backed by local storage. Using raw zvols requires waiting for them to be initialized, which the instance start saga is now responsible for polling. In the instance start saga, ensure each raw zvol in parallel because this can take some time. This PR also explicitly removes the inherited encryption for these zvols. This is accomplished by changing `EncryptionDetails` to be an enum where the call site now has to explicitly choose between inheriting the parent dataset's encryption, using `aes-256-gcm`, and explicitly setting encryption off. Also chopped out is setting `volblocksize`: this was a mistake to set, and it should instead be left to the default value. Still to do is to revisit the overhead reserved for zvols: the previous emperical value was not for the new raw zvol type. This is tracked by oxidecomputer#9591. Fixes oxidecomputer#9519.
illumos-utils/src/zfs.rs
Outdated
| match execute_async(cmd).await { | ||
| Ok(_) => Ok(()), | ||
| Ok(_) => { | ||
| // no-op |
There was a problem hiding this comment.
Does this just mean "Command succeeded, nothing else to do"?
There was a problem hiding this comment.
I was a little confused that "no-op" meant "yeah I did it and it worked"
illumos-utils/src/zfs.rs
Outdated
| // Open the raw zvol and read from it. If EINPROGRESS is seen from | ||
| // the read, then it's not done initializing yet. We can't delete it | ||
| // until it's fully done, so return the `NotReady` variant to signal | ||
| // upstack that it should poll. |
There was a problem hiding this comment.
that's a bummer; @grwilson is this something we could change in the future?
There was a problem hiding this comment.
There will be an ioctl interface that will allow for the appstack to stop the initialization so that the process can be canceled or aborted.
There was a problem hiding this comment.
8c4376b changes to use DKIOCRAWVOLSTOP to stop the init
illumos-utils/src/zfs.rs
Outdated
| // Open the raw zvol and read from it. If EINPROGRESS is seen from | ||
| // the read, then it's not done initializing yet. We can't delete it | ||
| // until it's fully done, so return the `NotReady` variant to signal | ||
| // upstack that it should poll. |
There was a problem hiding this comment.
would it be worth creating a fn that delete and create both call rather than having the duplicated code?
There was a problem hiding this comment.
I changed to use DKIOCRAWVOLSTOP so this code looks less duplicated now
illumos-utils/src/zfs.rs
Outdated
| // The numeric type of these constants is not specified due to being | ||
| // different for illumos and linux. They are also `let dkioc` and | ||
| // not `const DKIOC` because `const` requires a type. | ||
| // | ||
| // XXX is there a way to pull this constant in from | ||
| // /usr/include/sys/dkio.h | ||
| let dkioc = 0x04 << 8; | ||
| let dkiocrawvolstop = dkioc | 31; |
There was a problem hiding this comment.
| // The numeric type of these constants is not specified due to being | |
| // different for illumos and linux. They are also `let dkioc` and | |
| // not `const DKIOC` because `const` requires a type. | |
| // | |
| // XXX is there a way to pull this constant in from | |
| // /usr/include/sys/dkio.h | |
| let dkioc = 0x04 << 8; | |
| let dkiocrawvolstop = dkioc | 31; | |
| #[cfg(target_os = "illumos")] | |
| const DKIOCRAWVOLSTOP: libc::c_int = (0x04 << 8) | 0x1f; | |
| #[cfg(not(target_os = "illumos"))] | |
| const DKIOCRAWVOLSTOP: libc::c_ulong = (0x04 << 8) | 0x1f; |
eh?
You could use bindgen--I don't think it's worth it.
There was a problem hiding this comment.
Ah, this works. I was thinking to do the cfg specialization like this but forgot about not and was thinking that I'd have to add a target for each OS. Done in ac3d056
|
Do not merge this yet, it depends on a fix for #9607 |
|
Note to self: must change this PR to set volblocksize = 128k again. |
|
|
||
| pub async fn delete_dataset_volume( | ||
| params: DatasetVolumeDeleteArgs<'_>, | ||
| ) -> Result<(), DeleteDatasetVolumeError> { |
There was a problem hiding this comment.
Random comment here to remind us to figure out why the delete failed on berlin over the weekend.
Sorry for this message not actually having any useful information regarding that failure.
There was a problem hiding this comment.
94345a3 fixes one of the bugs you saw, but doesn't address why the parent dataset could not be deleted originally
There was a problem hiding this comment.
I'll try again deleting and see what shakes out.
There was a problem hiding this comment.
I've done a bunch of creating and deleting on london with the latest, and things look to be working as expected. Before it would fail right away and that is no longer the case.
If the /dev/zvol/rdsk/... path is gone, then skip trying to delete what is probably already deleted. Check the DKIOCRAWVOLSTATUS output to see if the init thread even needs stopping, and return early accordingly. In the face of a transient EINTR stopping the raw zvol init thread, retry the ioctl once. In the face of a transient "dataset is busy" when deleting the parent dataset, retry once. Other parts of sled-agent (like inventory collection) maybe caused this, we never nailed it down. Refactor waiting logic into separate `wait_for_raw_volume_ok_to_delete` function to ensure that the fd is dropped and this part of sled agent isn't holding an open file descriptor to the raw zvol while trying to delete it.
|
I have no idea how that got closed...? |
| } | ||
|
|
||
| _ => { | ||
| // The zero init thread encountered an error, we can proceed |
There was a problem hiding this comment.
Is this where we might change the current code to handle an init that runs forever retrying the init? Or, in this case are we okay and the init will have failed with an error and we will just go ahead and delete it?
There was a problem hiding this comment.
If there's an error running the zero init thread (which I'm assuming also does the allocation), I'm assuming we fail and are not retrying, yeah. We'll have to delete the raw vol and try again.
grwilson
left a comment
There was a problem hiding this comment.
The raw volume status check logic looks good to me (including the EFRAGS). I also verified that the current code will report back EFRAGS if we encounter gang blocks with a blocksize of 128k.
leftwo
left a comment
There was a problem hiding this comment.
I've been running this over the weekend and I've not seen any problems.
For the review, I think all I wanted was just comments updated. Great to see
this work going back in!
|
|
||
| pub async fn delete_dataset_volume( | ||
| params: DatasetVolumeDeleteArgs<'_>, | ||
| ) -> Result<(), DeleteDatasetVolumeError> { |
There was a problem hiding this comment.
I've done a bunch of creating and deleting on london with the latest, and things look to be working as expected. Before it would fail right away and that is no longer the case.

Request the new raw zvol type when creating disks backed by local storage. Using raw zvols requires waiting for them to be initialized, which the instance start saga is now responsible for polling.
In the instance start saga, ensure each raw zvol in parallel because this can take some time.
This PR also explicitly removes the inherited encryption for these zvols. This is accomplished by changing
EncryptionDetailsto be an enum where the call site now has to explicitly choose between inheriting the parent dataset's encryption, usingaes-256-gcm, and explicitly setting encryption off.Also chopped out is setting
volblocksize: this was a mistake to set, and it should instead be left to the default value.Still to do is to revisit the overhead reserved for zvols: the previous emperical value was not for the new raw zvol type. This is tracked by #9591.
Fixes #9519.