Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions man/man8/zpool-remove.8
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,8 @@ This command supports removing hot spare, cache, log, and both mirrored and
non-redundant primary top-level vdevs, including dedup and special vdevs.
.Pp
Top-level vdevs can only be removed if the primary pool storage does not contain
a top-level raidz vdev, all top-level vdevs have the same sector size, and the
keys for all encrypted datasets are loaded.
a top-level raidz or draid vdev, all top-level vdevs have the same ashift size,
and the keys for all encrypted datasets are loaded.
.Pp
Removing a top-level vdev reduces the total amount of space in the storage pool.
The specified device will be evacuated by copying all allocated space from it to
Expand Down
80 changes: 58 additions & 22 deletions module/zfs/vdev_removal.c
Original file line number Diff line number Diff line change
Expand Up @@ -51,34 +51,70 @@
#include <sys/trace_zfs.h>

/*
* This file contains the necessary logic to remove vdevs from a
* storage pool. Currently, the only devices that can be removed
* are log, cache, and spare devices; and top level vdevs from a pool
* w/o raidz or mirrors. (Note that members of a mirror can be removed
* by the detach operation.)
* This file contains the necessary logic to remove vdevs from a storage
* pool. Note that members of a mirror can be removed by the detach
* operation. Currently, the only devices that can be removed are:
*
* Log vdevs are removed by evacuating them and then turning the vdev
* into a hole vdev while holding spa config locks.
* 1) Traditional hot spare and cache vdevs. Note that draid distributed
* spares are fixed at creation time and cannot be removed.
*
* Top level vdevs are removed and converted into an indirect vdev via
* a multi-step process:
* 2) Log vdevs are removed by evacuating them and then turning the vdev
* into a hole vdev while holding spa config locks.
*
* - Disable allocations from this device (spa_vdev_remove_top).
* 3) Top-level singleton and mirror vdevs, including dedup and special
* vdevs, are removed and converted into an indirect vdev via a
* multi-step process:
*
* - From a new thread (spa_vdev_remove_thread), copy data from
* the removing vdev to a different vdev. The copy happens in open
* context (spa_vdev_copy_impl) and issues a sync task
* (vdev_mapping_sync) so the sync thread can update the partial
* indirect mappings in core and on disk.
* - Disable allocations from this device (spa_vdev_remove_top).
*
* - If a free happens during a removal, it is freed from the
* removing vdev, and if it has already been copied, from the new
* location as well (free_from_removing_vdev).
* - From a new thread (spa_vdev_remove_thread), copy data from the
* removing vdev to a different vdev. The copy happens in open context
* (spa_vdev_copy_impl) and issues a sync task (vdev_mapping_sync) so
* the sync thread can update the partial indirect mappings in core
* and on disk.
*
* - After the removal is completed, the copy thread converts the vdev
* into an indirect vdev (vdev_remove_complete) before instructing
* the sync thread to destroy the space maps and finish the removal
* (spa_finish_removal).
* - If a free happens during a removal, it is freed from the removing
* vdev, and if it has already been copied, from the new location as
* well (free_from_removing_vdev).
*
* - After the removal is completed, the copy thread converts the vdev
* into an indirect vdev (vdev_remove_complete) before instructing
* the sync thread to destroy the space maps and finish the removal
* (spa_finish_removal).
*
* The following constraints currently apply primary device removal:
*
* - All vdevs must be online, healthy, and not be missing any data
* according to the DTLs.
*
* - When removing a singleton or mirror vdev, regardless of it's a
* special, dedup, or primary device, it must have the same ashift
* as the devices in the normal allocation class. Furthermore, all
* vdevs in the normal allocation class must have the same ashift to
* ensure the new allocations never includes additional padding.
*
* - The normal allocation class cannot contain any raidz or draid
* top-level vdevs since segments are copied without regard for block
* boundaries. This makes it impossible to calculate the required
* parity columns when using these vdev types as the destination.
*
* - The encryption keys must be loaded so the ZIL logs can be reset
* in order to prevent writing to the device being removed.
*
* N.B. ashift and raidz/draid constraints for primary top-level device
* removal could be slightly relaxed if it were possible to request that
* DVAs from a mirror or singleton in the specified allocation class be
* used (metaslab_alloc_dva).
*
* This flexibility would be particularly useful for raidz/draid pools which
* often include a mirrored special device. If a mistakenly added top-level
* singleton were added it could then still be removed at the cost of some
* special device capacity. This may be a worthwhile tradeoff depending on
* the pool capacity and expense (cost, complexity, time) of creating a new
* pool and copying all of the data to correct the configuration.
*
* Furthermore, while not currently supported it should be possible to allow
* vdevs of any type to be removed as long as they've never been written to.
*/

typedef struct vdev_copy_arg {
Expand Down
Loading