Skip to content

Commit d12f421

Browse files
authored
Merge pull request #28970 from skrishna-unix/dev-1.22
Volume Populators Redesign Blog
2 parents 4f203c6 + de7bca7 commit d12f421

File tree

1 file changed

+219
-0
lines changed

1 file changed

+219
-0
lines changed
Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes 1.22: A New Design for Volume Populators"
4+
date: 2021-08-30
5+
slug: volume-populators-redesigned
6+
---
7+
8+
**Authors:**
9+
Ben Swartzlander (NetApp)
10+
11+
Kubernetes v1.22, released earlier this month, introduced a redesigned approach for volume
12+
populators. Originally implemented
13+
in v1.18, the API suffered from backwards compatibility issues. Kubernetes v1.22 includes a new API
14+
field called `dataSourceRef` that fixes these problems.
15+
16+
## Data sources
17+
18+
Earlier Kubernetes releases already added a `dataSource` field into the
19+
[PersistentVolumeClaim](/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) API,
20+
used for cloning volumes and creating volumes from snapshots. You could use the `dataSource` field when
21+
creating a new PVC, referencing either an existing PVC or a VolumeSnapshot in the same namespace.
22+
That also modified the normal provisioning process so that instead of yielding an empty volume, the
23+
new PVC contained the same data as either the cloned PVC or the cloned VolumeSnapshot.
24+
25+
Volume populators embrace the same design idea, but extend it to any type of object, as long
26+
as there exists a [custom resource](/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
27+
to define the data source, and a populator controller to implement the logic. Initially,
28+
the `dataSource` field was directly extended to allow arbitrary objects, if the `AnyVolumeDataSource`
29+
feature gate was enabled on a cluster. That change unfortunately caused backwards compatibility
30+
problems, and so the new `dataSourceRef` field was born.
31+
32+
In v1.22 if the `AnyVolumeDataSource` feature gate is enabled, the `dataSourceRef` field is
33+
added, which behaves similarly to the `dataSource` field except that it allows arbitrary
34+
objects to be specified. The API server ensures that the two fields always have the same
35+
contents, and neither of them are mutable. The differences is that at creation time
36+
`dataSource` allows only PVCs or VolumeSnapshots, and ignores all other values, while
37+
`dataSourceRef` allows most types of objects, and in the few cases it doesn't allow an
38+
object (core objects other than PVCs) a validation error occurs.
39+
40+
When this API change graduates to stable, we would deprecate using `dataSource` and recommend
41+
using `dataSourceRef` field for all use cases.
42+
In the v1.22 release, `dataSourceRef` is available (as an alpha feature) specifically for cases
43+
where you want to use for custom volume populators.
44+
45+
## Using populators
46+
47+
Every volume populator must have one or more CRDs that it supports. Administrators may
48+
install the CRD and the populator controller and then PVCs with a `dataSourceRef` specifies
49+
a CR of the type that the populator supports will be handled by the populator controller
50+
instead of the CSI driver directly.
51+
52+
Underneath the covers, the CSI driver is still invoked to create an empty volume, which
53+
the populator controller fills with the appropriate data. The PVC doesn't bind to the PV
54+
until it's fully populated, so it's safe to define a whole application manifest including
55+
pod and PVC specs and the pods won't begin running until everything is ready, just as if
56+
the PVC was a clone of another PVC or VolumeSnapshot.
57+
58+
## How it works
59+
60+
PVCs with data sources are still noticed by the external-provisioner sidecar for the
61+
related storage class (assuming a CSI provisioner is used), but because the sidecar
62+
doesn't understand the data source kind, it doesn't do anything. The populator controller
63+
is also watching for PVCs with data sources of a kind that it understands and when it
64+
sees one, it creates a temporary PVC of the same size, volume mode, storage class,
65+
and even on the same topology (if topology is used) as the original PVC. The populator
66+
controller creates a worker pod that attaches to the volume and writes the necessary
67+
data to it, then detaches from the volume and the populator controller rebinds the PV
68+
from the temporary PVC to the orignal PVC.
69+
70+
## Trying it out
71+
72+
The following things are required to use volume populators:
73+
* Enable the `AnyVolumeDataSource` feature gate
74+
* Install a CRD for the specific data source / populator
75+
* Install the populator controller itself
76+
77+
Populator controllers may use the [lib-volume-populator](https://github.com/kubernetes-csi/lib-volume-populator)
78+
library to do most of the Kubernetes API level work. Individual populators only need to
79+
provide logic for actually writing data into the volume based on a particular CR
80+
instance. This library provides a sample populator implementation.
81+
82+
These optional components improve user experience:
83+
* Install the VolumePopulator CRD
84+
* Create a VolumePopulator custom respource for each specific data source
85+
* Install the [volume data source validator](https://github.com/kubernetes-csi/volume-data-source-validator)
86+
controller (alpha)
87+
88+
The purpose of these components is to generate warning events on PVCs with data sources
89+
for which there is no populator.
90+
91+
## Putting it all together
92+
93+
To see how this works, you can install the sample "hello" populator and try it
94+
out.
95+
96+
First install the volume-data-source-validator controller.
97+
98+
```terminal
99+
kubectl apply -f https://github.com/kubernetes-csi/volume-data-source-validator/blob/master/deploy/kubernetes/rbac-data-source-validator.yaml
100+
kubectl apply -f https://github.com/kubernetes-csi/volume-data-source-validator/blob/master/deploy/kubernetes/setup-data-source-validator.yaml
101+
```
102+
103+
Next install the example populator.
104+
105+
```terminal
106+
kubectl apply -f https://github.com/kubernetes-csi/lib-volume-populator/blob/master/example/hello-populator/crd.yaml
107+
kubectl apply -f https://github.com/kubernetes-csi/lib-volume-populator/blob/master/example/hello-populator/deploy.yaml
108+
```
109+
110+
Create an instance of the `Hello` CR, with some text.
111+
112+
```yaml
113+
apiVersion: hello.k8s.io/v1alpha1
114+
kind: Hello
115+
metadata:
116+
name: example-hello
117+
spec:
118+
fileName: example.txt
119+
fileContents: Hello, world!
120+
```
121+
122+
Create a PVC that refers to that CR as its data source.
123+
124+
```yaml
125+
apiVersion: v1
126+
kind: PersistentVolumeClaim
127+
metadata:
128+
name: example-pvc
129+
spec:
130+
accessModes:
131+
- ReadWriteOnce
132+
resources:
133+
requests:
134+
storage: 10Mi
135+
dataSourceRef:
136+
apiGroup: hello.k8s.io
137+
kind: Hello
138+
name: example-hello
139+
volumeMode: Filesystem
140+
```
141+
142+
Next, run a job that reads the file in the PVC.
143+
144+
```yaml
145+
apiVersion: batch/v1
146+
kind: Job
147+
metadata:
148+
name: example-job
149+
spec:
150+
template:
151+
spec:
152+
containers:
153+
- name: example-container
154+
image: busybox:latest
155+
command:
156+
- cat
157+
- /mnt/example.txt
158+
volumeMounts:
159+
- name: vol
160+
mountPath: /mnt
161+
restartPolicy: Never
162+
volumes:
163+
- name: vol
164+
persistentVolumeClaim:
165+
claimName: example-pvc
166+
```
167+
168+
Wait for the job to complete (including all of its dependencies).
169+
170+
```terminal
171+
kubectl wait --for=condition=Complete job/example-job
172+
```
173+
174+
And last examine the log from the job.
175+
176+
```terminal
177+
kubectl logs job/example-job
178+
Hello, world!
179+
```
180+
181+
Note that the volume already contained a text file with the string contents from
182+
the CR. This is only the simplest example. Actual populators can set up the volume
183+
to contain arbitrary contents.
184+
185+
## How to write your own volume populator
186+
187+
Developers interested in writing new poplators are encouraged to use the
188+
[lib-volume-populator](https://github.com/kubernetes-csi/lib-volume-populator) library
189+
and to only supply a small controller wrapper around the library, and a pod image
190+
capable of attaching to volumes and writing the appropriate data to the volume.
191+
192+
Individual populators can be extremely generic such that they work with every type
193+
of PVC, or they can do vendor specific things to rapidly fill a volume with data
194+
if the volume was provisioned by a specific CSI driver from the same vendor, for
195+
example, by communicating directly with the storage for that volume.
196+
197+
## The future
198+
199+
As this feature is still in alpha, we expect to update the out of tree controllers
200+
with more tests and documentation. The community plans to eventually re-implement
201+
the populator library as a sidecar, for ease of operations.
202+
203+
We hope to see some official community-supported populators for some widely-shared
204+
use cases. Also, we expect that volume populators will be used by backup vendors
205+
as a way to "restore" backups to volumes, and possibly a standardized API to do
206+
this will evolve.
207+
208+
## How can I learn more?
209+
210+
The enhancement proposal,
211+
[Volume Populators](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1495-volume-populators), includes lots of detail about the history and technical implementation
212+
of this feature.
213+
214+
[Volume populators and data sources](/docs/concepts/storage/persistent-volumes/#volume-populators-and-data-sources), within the documentation topic about persistent volumes,
215+
explains how to use this feature in your cluster.
216+
217+
Please get involved by joining the Kubernetes storage SIG to help us enhance this
218+
feature. There are a lot of good ideas already and we'd be thrilled to have more!
219+

0 commit comments

Comments
 (0)