Skip to content

Commit f2a2de3

Browse files
authored
Merge pull request kubernetes#2039 from gnufied/update-allow-volume-expansion
Bring volume expansion enhancement as a KEP
2 parents cff1119 + c1aca92 commit f2a2de3

File tree

2 files changed

+375
-0
lines changed

2 files changed

+375
-0
lines changed
Lines changed: 346 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,346 @@
1+
# Growing Persistent Volume size
2+
3+
## Table of Contents
4+
5+
<!-- toc -->
6+
- [Release Signoff Checklist](#release-signoff-checklist)
7+
- [Goals](#goals)
8+
- [Non Goals](#non-goals)
9+
- [Use Cases](#use-cases)
10+
- [Volume Plugin Matrix](#volume-plugin-matrix)
11+
- [Implementation Design](#implementation-design)
12+
- [Prerequisite](#prerequisite)
13+
- [Admission Control and Validations](#admission-control-and-validations)
14+
- [Controller Manager resize](#controller-manager-resize)
15+
- [File system resize on kubelet](#file-system-resize-on-kubelet)
16+
- [Prerequisite of File system resize](#prerequisite-of-file-system-resize)
17+
- [Steps for resizing file system available on Volume](#steps-for-resizing-file-system-available-on-volume)
18+
- [Reduce coupling between resize operation and file system type](#reduce-coupling-between-resize-operation-and-file-system-type)
19+
- [API and UI Design](#api-and-ui-design)
20+
- [API Changes](#api-changes)
21+
- [PVC API Change](#pvc-api-change)
22+
- [StorageClass API change](#storageclass-api-change)
23+
- [Other API changes](#other-api-changes)
24+
<!-- /toc -->
25+
26+
## Release Signoff Checklist
27+
28+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
29+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
30+
- [x] (R) Design details are appropriately documented
31+
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
32+
- [x] (R) Graduation criteria is in place
33+
- [ ] (R) Production readiness review completed
34+
- [ ] Production readiness review approved
35+
- [x] "Implementation History" section is up-to-date for milestone
36+
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
37+
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
38+
39+
## Goals
40+
41+
Enable users to increase size of PVs that their pods are using. The user will update PVC for requesting a new size. Underneath we expect that - a controller will apply the change to PV which is bound to the PVC.
42+
43+
## Non Goals
44+
45+
* Reducing size of Persistent Volumes: We realize that, reducing size of PV is way riskier than increasing it. Reducing size of a PV could be a destructive operation and it requires support from underlying file system and volume type. In most cases it also requires that file system being resized is unmounted.
46+
47+
* Rebinding PV and PVC: Kubernetes will only attempt to resize the currently bound PV and PVC and will not attempt to relocate data from a PV to a new PV and rebind the PVC to newly created PV.
48+
49+
## Use Cases
50+
51+
* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (*online and with data*)
52+
* As a user I created a PVC requesting 2GB space. I am yet to start a pod with this PVC but I realize that I probably need more space. Without having to create a new PVC, I should be able to request more size with same PVC. (*offline and no data on disk*)
53+
* As a user I was running a rails application with 5GB of assets PVC. I have taken my application offline for maintenance but I would like to grow asset PVC to 10GB in size. (*offline but with data*)
54+
* As a user I am running an application on glusterfs. I should be able to resize the gluster volume without losing data or mount point. (*online and with data and without taking pod offline*)
55+
* In the logging project we run on dedicated clusters, we start out with 187Gi PVs for each of the elastic search pods. However, the amount of logs being produced can vary greatly from one cluster to another and its not uncommon that these volumes fill and we need to grow them.
56+
57+
## Volume Plugin Matrix
58+
59+
60+
| Volume Plugin | Supports Resize | Requires File system Resize | Supported in 1.8 Release |
61+
| ----------------| :---------------: | :--------------------------:| :----------------------: |
62+
| EBS | Yes | Yes | Yes |
63+
| GCE PD | Yes | Yes | Yes |
64+
| GlusterFS | Yes | No | Yes |
65+
| Cinder | Yes | Yes | Yes |
66+
| Vsphere | Yes | Yes | No |
67+
| Ceph RBD | Yes | Yes | No |
68+
| Host Path | No | No | No |
69+
| Azure Disk | Yes | Yes | No |
70+
| Azure File | No | No | No |
71+
| Cephfs | No | No | No |
72+
| NFS | No | No | No |
73+
| Flex | Yes | Maybe | No |
74+
| LocalStorage | Yes | Yes | No |
75+
76+
77+
## Implementation Design
78+
79+
For volume type that requires both file system expansion and a volume plugin based modification, growing persistent volumes will be two
80+
step process.
81+
82+
83+
For volume types that only require volume plugin based api call, this will be one step process.
84+
85+
### Prerequisite
86+
87+
* `pvc.spec.resources.requests.storage` field of pvc object will become mutable after this change.
88+
* #sig-api-machinery has agreed to allow pvc's status update from kubelet as long as pvc and node relationship
89+
can be validated by node authorizer.
90+
* This feature will be protected by an alpha feature gate, so as API changes needed for it.
91+
92+
93+
### Admission Control and Validations
94+
95+
* Resource quota code has to be updated to take into account PVC expand feature.
96+
* In case volume plugin doesn’t support resize feature. The resize API request will be rejected and PVC object will not be saved. This check will be performed via an admission controller plugin.
97+
* In case requested size is smaller than current size of PVC. A validation will be used to reject the API request. (This could be moved to admission controller plugin too.)
98+
* Not all PVCs will be resizable even if underlying volume plugin allows that. Only dynamically provisioned volumes
99+
which are explicitly enabled by an admin will be allowed to be resized. A plugin in admission controller will forbid
100+
size update for PVCs for which resizing is not enabled by the admin.
101+
* The design proposal for raw block devices should make sure that, users aren't able to resize raw block devices.
102+
103+
104+
### Controller Manager resize
105+
106+
A new controller called `volume_expand_controller` will listen for pvc size expansion requests and take action as needed. The steps performed in this
107+
new controller will be:
108+
109+
* Watch for pvc update requests and add pvc to controller's work queue if a increase in volume size was requested. Once PVC is added to
110+
controller's work queue - `pvc.Status.Conditions` will be updated with `ResizeStarted: True`.
111+
* For unbound or pending PVCs - resize will trigger no action in `volume_expand_controller`.
112+
* If `pv.Spec.Capacity` already is of size greater or equal than requested size, similarly no action will be performed by the controller.
113+
* A separate goroutine will read work queue and perform corresponding volume resize operation. If there is a resize operation in progress
114+
for same volume then resize request will be pending and retried once previous resize request has completed.
115+
* Controller resize in effect will be level based rather than edge based. If there are more than one pending resize request for same PVC then
116+
new resize requests for same PVC will replace older pending request.
117+
* Resize will be performed via volume plugin interface, executed inside a goroutine spawned by `operation_executor`.
118+
* A new plugin interface called `volume.Expander` will be added to volume plugin interface. The `Expander` interface
119+
will also define if volume requires a file system resize:
120+
121+
```go
122+
type Expander interface {
123+
// ExpandVolume expands the volume
124+
ExpandVolumeDevice(spec *Spec, newSize resource.Quantity, oldSize resource.Quantity) error
125+
RequiresFSResize() bool
126+
}
127+
```
128+
129+
* The controller call to expand the PVC will look like:
130+
131+
```go
132+
func (og *operationGenerator) GenerateExpandVolumeFunc(
133+
pvcWithResizeRequest *expandcache.PvcWithResizeRequest,
134+
resizeMap expandcache.VolumeResizeMap) (func() error, error) {
135+
136+
volumePlugin, err := og.volumePluginMgr.FindExpandablePluginBySpec(pvcWithResizeRequest.VolumeSpec)
137+
expanderPlugin, err := volumePlugin.NewExpander(pvcWithResizeRequest.VolumeSpec)
138+
139+
140+
expandFunc := func() error {
141+
expandErr := expanderPlugin.ExpandVolumeDevice(pvcWithResizeRequest.ExpectedSize, pvcWithResizeRequest.CurrentSize)
142+
143+
if expandErr != nil {
144+
og.recorder.Eventf(pvcWithResizeRequest.PVC, v1.EventTypeWarning, kevents.VolumeResizeFailed, expandErr.Error())
145+
resizeMap.MarkResizeFailed(pvcWithResizeRequest, expandErr.Error())
146+
return expandErr
147+
}
148+
149+
// CloudProvider resize succeeded - lets mark api objects as resized
150+
if expanderPlugin.RequiresFSResize() {
151+
err := resizeMap.MarkForFileSystemResize(pvcWithResizeRequest)
152+
if err != nil {
153+
og.recorder.Eventf(pvcWithResizeRequest.PVC, v1.EventTypeWarning, kevents.VolumeResizeFailed, err.Error())
154+
return err
155+
}
156+
} else {
157+
err := resizeMap.MarkAsResized(pvcWithResizeRequest)
158+
159+
if err != nil {
160+
og.recorder.Eventf(pvcWithResizeRequest.PVC, v1.EventTypeWarning, kevents.VolumeResizeFailed, err.Error())
161+
return err
162+
}
163+
}
164+
return nil
165+
166+
}
167+
return expandFunc, nil
168+
}
169+
```
170+
171+
* Once volume expand is successful, the volume will be marked as expanded and new size will be updated in `pv.spec.capacity`. Any errors will be reported as *events* on PVC object.
172+
* If resize failed in above step, in addition to events - `pvc.Status.Conditions` will be updated with `ResizeFailed: True`. Corresponding error will be added to condition field as well.
173+
* Depending on volume type next steps would be:
174+
175+
* If volume is of type that does not require file system resize, then `pvc.status.capacity` will be immediately updated to reflect new size. This would conclude the volume expand operation. Also `pvc.Status.Conditions` will be updated with `Ready: True`.
176+
* If volume is of type that requires file system resize then a file system resize will be performed on kubelet. Read below for steps that will be performed for file system resize.
177+
178+
* If volume plugin is of type that can not do resizing of attached volumes (such as `Cinder`) then `ExpandVolumeDevice` can return error by checking for
179+
volume status with its own API (such as by making Openstack Cinder API call in this case). Controller will keep trying to resize the volume until it is
180+
successful.
181+
182+
* To consider cases of missed PVC update events, an additional loop will reconcile bound PVCs with PVs. This additional loop will loop through all PVCs
183+
and match `pvc.spec.resources.requests` with `pv.spec.capacity` and add PVC in `volume_expand_controller`'s work queue if `pv.spec.capacity` is less
184+
than `pvc.spec.resources.requests`.
185+
186+
* There will be additional checks in controller that grows PV size - to ensure that we do not make volume plugin API calls that can reduce size of PV.
187+
188+
### File system resize on kubelet
189+
190+
A File system resize will be pending on PVC until a new pod that uses this volume is scheduled somewhere. While theoretically we *can* perform
191+
online file system resize if volume type and file system supports it - we are leaving it for next iteration of this feature.
192+
193+
#### Prerequisite of File system resize
194+
195+
* `pv.spec.capacity` must be greater than `pvc.status.spec.capacity`.
196+
* A fix in pv_controller has to made to fix `claim.Status.Capacity` only during binding. See comment by jan here - https://github.com/kubernetes/community/pull/657#discussion_r128008128
197+
* A fix in attach_detach controller has to be made to prevent fore detaching of volumes that are undergoing resize.
198+
This can be done by checking `pvc.Status.Conditions` during force detach. `AttachedVolume` struct doesn't hold a reference to PVC - so PVC info can either be directly cached in `AttachedVolume` along with PV spec or it can be fetched from PersistentVolume's ClaimRef binding info.
199+
200+
#### Steps for resizing file system available on Volume
201+
202+
* When calling `MountDevice` or `Setup` call of volume plugin, volume manager will in addition compare `pv.spec.capacity` and `pvc.status.capacity` and if `pv.spec.capacity` is greater
203+
than `pvc.status.spec.capacity` then volume manager will additionally resize the file system of volume.
204+
* The call to resize file system will be performed inside `operation_generator.GenerateMountVolumeFunc`. `VolumeToMount` struct will be enhanced to store PVC as well.
205+
* The flow of file system resize will be as follow:
206+
* Perform a resize based on file system used inside block device.
207+
* If resize succeeds, proceed with mounting the device as usual.
208+
* If resize failed with an error that shows no file system exists on the device, then log a warning and proceed with format and mount.
209+
* If resize failed with any other error then fail the mount operation.
210+
* Any errors during file system resize will be added as *events* to Pod object and mount operation will be failed.
211+
* If there are any errors during file system resize `pvc.Status.Conditions` will be updated with `ResizeFailed: True`. Any errors will be added to
212+
`Conditions` field.
213+
* File System resize will not be performed on kubelet where volume being attached is ReadOnly. This is similar to pattern being used for performing formatting.
214+
* After file system resize is successful, `pvc.status.capacity` will be updated to match `pv.spec.capacity` and volume expand operation will be considered complete. Also `pvc.Status.Conditions` will be updated with `Ready: True`.
215+
216+
#### Reduce coupling between resize operation and file system type
217+
218+
A file system resize in general requires presence of tools such as `resize2fs` or `xfs_growfs` on the host where kubelet is running. There is a concern
219+
that open coding call to different resize tools directly in Kubernetes will result in coupling between file system and resize operation. To solve this problem
220+
we have considered following options:
221+
222+
1. Write a library that abstracts away various file system operations, such as - resizing, formatting etc.
223+
224+
Pros:
225+
* Relatively well known pattern
226+
227+
Cons:
228+
* Depending on version with which Kubernetes is compiled with, we are still tied to which file systems are supported in which version
229+
of kubernetes.
230+
2. Ship a wrapper shell script that encapsulates various file system operations and as long as the shell script supports particular file system
231+
the resize operation is supported.
232+
Pros:
233+
* Kubernetes Admin can easily replace default shell script with her own version and thereby adding support for more file system types.
234+
235+
Cons:
236+
* I don't know if there is a pattern that exists in kube today for shipping shell scripts that are called out from code in Kubernetes. Flex is
237+
different because, none of the flex scripts are shipped with Kubernetes.
238+
3. Ship resizing tools in a container.
239+
240+
241+
Of all options - #3 is our best bet but we are not quite there yet. Hence, I would like to propose that we ship with support for
242+
most common file systems in current release and we revisit this coupling and solve it in next release.
243+
244+
## API and UI Design
245+
246+
Given a PVC definition:
247+
248+
```yaml
249+
kind: PersistentVolumeClaim
250+
apiVersion: v1
251+
metadata:
252+
name: volume-claim
253+
annotations:
254+
volume.beta.kubernetes.io/storage-class: "generalssd"
255+
spec:
256+
accessModes:
257+
- ReadWriteOnce
258+
resources:
259+
requests:
260+
storage: 1Gi
261+
```
262+
263+
Users can request new size of underlying PV by simply editing the PVC and requesting new size:
264+
265+
```
266+
~> kubectl edit pvc volume-claim
267+
kind: PersistentVolumeClaim
268+
apiVersion: v1
269+
metadata:
270+
name: volume-claim
271+
annotations:
272+
volume.beta.kubernetes.io/storage-class: "generalssd"
273+
spec:
274+
accessModes:
275+
- ReadWriteOnce
276+
resources:
277+
requests:
278+
storage: 10Gi
279+
```
280+
281+
## API Changes
282+
283+
### PVC API Change
284+
285+
`pvc.spec.resources.requests.storage` field of pvc object will become mutable after this change.
286+
287+
In addition to that PVC's status will have a `Conditions []PvcCondition` - which will be used
288+
to communicate the status of PVC to the user.
289+
290+
The API change will be protected by Alpha feature gate and api-server will not allow PVCs with
291+
`Status.Conditions` field if feature is not enabled. `omitempty` in serialization format will
292+
prevent presence of field if not set.
293+
294+
So the `PersistentVolumeClaimStatus` will become:
295+
296+
```go
297+
type PersistentVolumeClaimStatus struct {
298+
Phase PersistentVolumeClaimPhase
299+
AccessModes []PersistentVolumeAccessMode
300+
Capacity ResourceList
301+
// New Field added as part of this Change
302+
Conditions []PVCCondition
303+
}
304+
305+
// new API type added
306+
type PVCCondition struct {
307+
Type PVCConditionType
308+
Status ConditionStatus
309+
LastProbeTime metav1.Time
310+
LastTransitionTime metav1.Time
311+
Reason string
312+
Message string
313+
}
314+
315+
// new API type
316+
type PVCConditionType string
317+
318+
// new Constants
319+
const (
320+
PVCReady PVCConditionType = "Ready"
321+
PVCResizeStarted PVCConditionType = "ResizeStarted"
322+
PVCResizeFailed PVCResizeFailed = "ResizeFailed"
323+
)
324+
```
325+
326+
### StorageClass API change
327+
328+
A new field called `AllowVolumeExpand` will be added to StorageClass. The default of this value
329+
will be `false` and only if it is true - PVC expansion will be allowed.
330+
331+
```go
332+
type StorageClass struct {
333+
metav1.TypeMeta
334+
metav1.ObjectMeta
335+
Provisioner string
336+
Parameters map[string]string
337+
// New Field added
338+
// +optional
339+
AllowVolumeExpand bool
340+
}
341+
```
342+
343+
### Other API changes
344+
345+
This proposal relies on ability to update PVC status from kubelet. While updating PVC's status
346+
a PATCH request must be made from kubelet to update the status.
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
title: Growing Persistent Volume size
2+
authors:
3+
- "@gnuified"
4+
owning-sig: sig-storage
5+
participating-sigs:
6+
- sig-auth
7+
reviewers:
8+
- "@msau42"
9+
- "@liggit"
10+
- "@thockin"
11+
approvers:
12+
- "@saad-ali"
13+
editor: TBD
14+
creation-date: 2017-08-29
15+
last-updated: 2020-09-30
16+
status: implementable
17+
see-also:
18+
replaces:
19+
superseded-by:
20+
21+
latest-milestone: "v1.19"
22+
milestone:
23+
beta: "v1.11"
24+
feature-gates:
25+
- name: ExpandPersistentVolumes
26+
components:
27+
- kube-apiserver
28+
- kubelet
29+
- kube-controller-manager

0 commit comments

Comments
 (0)