|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Kubernetes 1.22: A New Design for Volume Populators" |
| 4 | +date: 2021-08-30 |
| 5 | +slug: volume-populators-redesigned |
| 6 | +--- |
| 7 | + |
| 8 | +**Authors:** |
| 9 | +Ben Swartzlander (NetApp) |
| 10 | + |
| 11 | +Kubernetes v1.22, released earlier this month, introduced a redesigned approach for volume |
| 12 | +populators. Originally implemented |
| 13 | +in v1.18, the API suffered from backwards compatibility issues. Kubernetes v1.22 includes a new API |
| 14 | +field called `dataSourceRef` that fixes these problems. |
| 15 | + |
| 16 | +## Data sources |
| 17 | + |
| 18 | +Earlier Kubernetes releases already added a `dataSource` field into the |
| 19 | +[PersistentVolumeClaim](/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) API, |
| 20 | +used for cloning volumes and creating volumes from snapshots. You could use the `dataSource` field when |
| 21 | +creating a new PVC, referencing either an existing PVC or a VolumeSnapshot in the same namespace. |
| 22 | +That also modified the normal provisioning process so that instead of yielding an empty volume, the |
| 23 | +new PVC contained the same data as either the cloned PVC or the cloned VolumeSnapshot. |
| 24 | + |
| 25 | +Volume populators embrace the same design idea, but extend it to any type of object, as long |
| 26 | +as there exists a [custom resource](/docs/concepts/extend-kubernetes/api-extension/custom-resources/) |
| 27 | +to define the data source, and a populator controller to implement the logic. Initially, |
| 28 | +the `dataSource` field was directly extended to allow arbitrary objects, if the `AnyVolumeDataSource` |
| 29 | +feature gate was enabled on a cluster. That change unfortunately caused backwards compatibility |
| 30 | +problems, and so the new `dataSourceRef` field was born. |
| 31 | + |
| 32 | +In v1.22 if the `AnyVolumeDataSource` feature gate is enabled, the `dataSourceRef` field is |
| 33 | +added, which behaves similarly to the `dataSource` field except that it allows arbitrary |
| 34 | +objects to be specified. The API server ensures that the two fields always have the same |
| 35 | +contents, and neither of them are mutable. The differences is that at creation time |
| 36 | +`dataSource` allows only PVCs or VolumeSnapshots, and ignores all other values, while |
| 37 | +`dataSourceRef` allows most types of objects, and in the few cases it doesn't allow an |
| 38 | +object (core objects other than PVCs) a validation error occurs. |
| 39 | + |
| 40 | +When this API change graduates to stable, we would deprecate using `dataSource` and recommend |
| 41 | +using `dataSourceRef` field for all use cases. |
| 42 | +In the v1.22 release, `dataSourceRef` is available (as an alpha feature) specifically for cases |
| 43 | +where you want to use for custom volume populators. |
| 44 | + |
| 45 | +## Using populators |
| 46 | + |
| 47 | +Every volume populator must have one or more CRDs that it supports. Administrators may |
| 48 | +install the CRD and the populator controller and then PVCs with a `dataSourceRef` specifies |
| 49 | +a CR of the type that the populator supports will be handled by the populator controller |
| 50 | +instead of the CSI driver directly. |
| 51 | + |
| 52 | +Underneath the covers, the CSI driver is still invoked to create an empty volume, which |
| 53 | +the populator controller fills with the appropriate data. The PVC doesn't bind to the PV |
| 54 | +until it's fully populated, so it's safe to define a whole application manifest including |
| 55 | +pod and PVC specs and the pods won't begin running until everything is ready, just as if |
| 56 | +the PVC was a clone of another PVC or VolumeSnapshot. |
| 57 | + |
| 58 | +## How it works |
| 59 | + |
| 60 | +PVCs with data sources are still noticed by the external-provisioner sidecar for the |
| 61 | +related storage class (assuming a CSI provisioner is used), but because the sidecar |
| 62 | +doesn't understand the data source kind, it doesn't do anything. The populator controller |
| 63 | +is also watching for PVCs with data sources of a kind that it understands and when it |
| 64 | +sees one, it creates a temporary PVC of the same size, volume mode, storage class, |
| 65 | +and even on the same topology (if topology is used) as the original PVC. The populator |
| 66 | +controller creates a worker pod that attaches to the volume and writes the necessary |
| 67 | +data to it, then detaches from the volume and the populator controller rebinds the PV |
| 68 | +from the temporary PVC to the orignal PVC. |
| 69 | + |
| 70 | +## Trying it out |
| 71 | + |
| 72 | +The following things are required to use volume populators: |
| 73 | +* Enable the `AnyVolumeDataSource` feature gate |
| 74 | +* Install a CRD for the specific data source / populator |
| 75 | +* Install the populator controller itself |
| 76 | + |
| 77 | +Populator controllers may use the [lib-volume-populator](https://github.com/kubernetes-csi/lib-volume-populator) |
| 78 | +library to do most of the Kubernetes API level work. Individual populators only need to |
| 79 | +provide logic for actually writing data into the volume based on a particular CR |
| 80 | +instance. This library provides a sample populator implementation. |
| 81 | + |
| 82 | +These optional components improve user experience: |
| 83 | +* Install the VolumePopulator CRD |
| 84 | +* Create a VolumePopulator custom respource for each specific data source |
| 85 | +* Install the [volume data source validator](https://github.com/kubernetes-csi/volume-data-source-validator) |
| 86 | + controller (alpha) |
| 87 | + |
| 88 | +The purpose of these components is to generate warning events on PVCs with data sources |
| 89 | +for which there is no populator. |
| 90 | + |
| 91 | +## Putting it all together |
| 92 | + |
| 93 | +To see how this works, you can install the sample "hello" populator and try it |
| 94 | +out. |
| 95 | + |
| 96 | +First install the volume-data-source-validator controller. |
| 97 | + |
| 98 | +```terminal |
| 99 | +kubectl apply -f https://github.com/kubernetes-csi/volume-data-source-validator/blob/master/deploy/kubernetes/rbac-data-source-validator.yaml |
| 100 | +kubectl apply -f https://github.com/kubernetes-csi/volume-data-source-validator/blob/master/deploy/kubernetes/setup-data-source-validator.yaml |
| 101 | +``` |
| 102 | + |
| 103 | +Next install the example populator. |
| 104 | + |
| 105 | +```terminal |
| 106 | +kubectl apply -f https://github.com/kubernetes-csi/lib-volume-populator/blob/master/example/hello-populator/crd.yaml |
| 107 | +kubectl apply -f https://github.com/kubernetes-csi/lib-volume-populator/blob/master/example/hello-populator/deploy.yaml |
| 108 | +``` |
| 109 | + |
| 110 | +Create an instance of the `Hello` CR, with some text. |
| 111 | + |
| 112 | +```yaml |
| 113 | +apiVersion: hello.k8s.io/v1alpha1 |
| 114 | +kind: Hello |
| 115 | +metadata: |
| 116 | + name: example-hello |
| 117 | +spec: |
| 118 | + fileName: example.txt |
| 119 | + fileContents: Hello, world! |
| 120 | +``` |
| 121 | +
|
| 122 | +Create a PVC that refers to that CR as its data source. |
| 123 | +
|
| 124 | +```yaml |
| 125 | +apiVersion: v1 |
| 126 | +kind: PersistentVolumeClaim |
| 127 | +metadata: |
| 128 | + name: example-pvc |
| 129 | +spec: |
| 130 | + accessModes: |
| 131 | + - ReadWriteOnce |
| 132 | + resources: |
| 133 | + requests: |
| 134 | + storage: 10Mi |
| 135 | + dataSourceRef: |
| 136 | + apiGroup: hello.k8s.io |
| 137 | + kind: Hello |
| 138 | + name: example-hello |
| 139 | + volumeMode: Filesystem |
| 140 | +``` |
| 141 | +
|
| 142 | +Next, run a job that reads the file in the PVC. |
| 143 | +
|
| 144 | +```yaml |
| 145 | +apiVersion: batch/v1 |
| 146 | +kind: Job |
| 147 | +metadata: |
| 148 | + name: example-job |
| 149 | +spec: |
| 150 | + template: |
| 151 | + spec: |
| 152 | + containers: |
| 153 | + - name: example-container |
| 154 | + image: busybox:latest |
| 155 | + command: |
| 156 | + - cat |
| 157 | + - /mnt/example.txt |
| 158 | + volumeMounts: |
| 159 | + - name: vol |
| 160 | + mountPath: /mnt |
| 161 | + restartPolicy: Never |
| 162 | + volumes: |
| 163 | + - name: vol |
| 164 | + persistentVolumeClaim: |
| 165 | + claimName: example-pvc |
| 166 | +``` |
| 167 | +
|
| 168 | +Wait for the job to complete (including all of its dependencies). |
| 169 | +
|
| 170 | +```terminal |
| 171 | +kubectl wait --for=condition=Complete job/example-job |
| 172 | +``` |
| 173 | + |
| 174 | +And last examine the log from the job. |
| 175 | + |
| 176 | +```terminal |
| 177 | +kubectl logs job/example-job |
| 178 | +Hello, world! |
| 179 | +``` |
| 180 | + |
| 181 | +Note that the volume already contained a text file with the string contents from |
| 182 | +the CR. This is only the simplest example. Actual populators can set up the volume |
| 183 | +to contain arbitrary contents. |
| 184 | + |
| 185 | +## How to write your own volume populator |
| 186 | + |
| 187 | +Developers interested in writing new poplators are encouraged to use the |
| 188 | +[lib-volume-populator](https://github.com/kubernetes-csi/lib-volume-populator) library |
| 189 | +and to only supply a small controller wrapper around the library, and a pod image |
| 190 | +capable of attaching to volumes and writing the appropriate data to the volume. |
| 191 | + |
| 192 | +Individual populators can be extremely generic such that they work with every type |
| 193 | +of PVC, or they can do vendor specific things to rapidly fill a volume with data |
| 194 | +if the volume was provisioned by a specific CSI driver from the same vendor, for |
| 195 | +example, by communicating directly with the storage for that volume. |
| 196 | + |
| 197 | +## The future |
| 198 | + |
| 199 | +As this feature is still in alpha, we expect to update the out of tree controllers |
| 200 | +with more tests and documentation. The community plans to eventually re-implement |
| 201 | +the populator library as a sidecar, for ease of operations. |
| 202 | + |
| 203 | +We hope to see some official community-supported populators for some widely-shared |
| 204 | +use cases. Also, we expect that volume populators will be used by backup vendors |
| 205 | +as a way to "restore" backups to volumes, and possibly a standardized API to do |
| 206 | +this will evolve. |
| 207 | + |
| 208 | +## How can I learn more? |
| 209 | + |
| 210 | +The enhancement proposal, |
| 211 | +[Volume Populators](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1495-volume-populators), includes lots of detail about the history and technical implementation |
| 212 | +of this feature. |
| 213 | + |
| 214 | +[Volume populators and data sources](/docs/concepts/storage/persistent-volumes/#volume-populators-and-data-sources), within the documentation topic about persistent volumes, |
| 215 | +explains how to use this feature in your cluster. |
| 216 | + |
| 217 | +Please get involved by joining the Kubernetes storage SIG to help us enhance this |
| 218 | +feature. There are a lot of good ideas already and we'd be thrilled to have more! |
| 219 | + |
0 commit comments