Skip to content

Commit a89676d

Browse files
authored
Merge pull request kubernetes#1494 from bswartz/generic-data-populators
Add KEP for generic data populators
2 parents e7c1dbe + 933a044 commit a89676d

File tree

1 file changed

+282
-0
lines changed

1 file changed

+282
-0
lines changed
Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
---
2+
title: Generic Data Populators
3+
authors:
4+
- "@bswartz"
5+
owning-sig: sig-storage
6+
participating-sigs:
7+
- sig-api-machinery
8+
reviewers:
9+
- "@thockin"
10+
- "@saad-ali"
11+
- "@smarterclayton"
12+
- "@j-griffith"
13+
approvers:
14+
- "@thockin"
15+
- "@saad-ali"
16+
creation-date: 2019-12-03
17+
last-updated: 2020-01-26
18+
status: provisional
19+
see-also:
20+
replaces:
21+
superseded-by:
22+
---
23+
24+
# Generic Data Populators
25+
26+
## Table of Contents
27+
28+
<!-- toc -->
29+
- [Release Signoff Checklist](#release-signoff-checklist)
30+
- [Summary](#summary)
31+
- [Motivation](#motivation)
32+
- [Goals](#goals)
33+
- [Non-Goals](#non-goals)
34+
- [Proposal](#proposal)
35+
- [User Stories](#user-stories)
36+
- [VM Images](#vm-images)
37+
- [Backup/Restore](#backuprestore)
38+
- [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
39+
- [Risks and Mitigations](#risks-and-mitigations)
40+
- [Design Details](#design-details)
41+
- [Test Plan](#test-plan)
42+
- [Graduation Criteria](#graduation-criteria)
43+
- [Alpha -&gt; Beta Graduation](#alpha---beta-graduation)
44+
- [Beta -&gt; GA Graduation](#beta---ga-graduation)
45+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
46+
- [Version Skew Strategy](#version-skew-strategy)
47+
- [Implementation History](#implementation-history)
48+
<!-- /toc -->
49+
50+
## Release Signoff Checklist
51+
52+
**ACTION REQUIRED:** In order to merge code into a release, there must be an issue in [kubernetes/enhancements] referencing this KEP and targeting a release milestone **before [Enhancement Freeze](https://github.com/kubernetes/sig-release/tree/master/releases)
53+
of the targeted release**.
54+
55+
For enhancements that make changes to code or processes/procedures in core Kubernetes i.e., [kubernetes/kubernetes], we require the following Release Signoff checklist to be completed.
56+
57+
Check these off as they are completed for the Release Team to track. These checklist items _must_ be updated for the enhancement to be released.
58+
59+
- [ ] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
60+
- [ ] KEP approvers have set the KEP status to `implementable`
61+
- [ ] Design details are appropriately documented
62+
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
63+
- [ ] Graduation criteria is in place
64+
- [ ] "Implementation History" section is up-to-date for milestone
65+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
66+
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
67+
68+
**Note:** Any PRs to move a KEP to `implementable` or significant changes once it is marked `implementable` should be approved by each of the KEP approvers. If any of those approvers is no longer appropriate than changes to that list should be approved by the remaining approvers and/or the owning SIG (or SIG-arch for cross cutting KEPs).
69+
70+
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
71+
72+
[kubernetes.io]: https://kubernetes.io/
73+
[kubernetes/enhancements]: https://github.com/kubernetes/enhancements/issues
74+
[kubernetes/kubernetes]: https://github.com/kubernetes/kubernetes
75+
[kubernetes/website]: https://github.com/kubernetes/website
76+
77+
## Summary
78+
79+
In Kubernetes 1.12, we added the `DataSource` field to the PVC spec. The field was implemented
80+
as a `TypedLocalObjectReference` to give flexibility in the future about what objects could be
81+
data sources for new volumes.
82+
83+
Since then, we have allowed only two things to be the source of a
84+
new volume -- existing PVCs (indicating a user's intent to clone the volume) and snapshots
85+
(indicating the user's intent to restore a snapshot). Implementation of these two data sources
86+
relies on a CSI plugin to do the actual work, and no generic implementation exists for cloning
87+
volumes or restoring snapshots.
88+
89+
Since the original design of the `DataSource` API field, we have been aware of the desire to
90+
populate volumes with data from other sources, both CSI-specific and generic (compatible with
91+
any CSI plugin). For new CSI-specific data sources, the path forward is clear, but for other
92+
sources of data, which I call "Generic Data Populators" we don't have a mechanism. The main
93+
problem is that current API validation logic uses a white list of object types, driven by
94+
feature gates for each object type. This approach won't scale to generic populators, which
95+
will by their nature be too numerous and varied.
96+
97+
This proposal recommends that we relax validation on the `DataSource` field to allow
98+
arbitrary object types to be data sources, and leave the implementation to controllers. This
99+
flexibility will allow us to experiment with approaches for data populators.
100+
101+
## Motivation
102+
103+
### Goals
104+
105+
- Enable users to create pre-populated volumes in a manner consistent with current practice
106+
- Enable developers to innovate with new an interesting data sources
107+
- Avoid changing existing workflows and breaking existing tools with new behavior
108+
109+
### Non-Goals
110+
111+
- This proposal DOES NOT recommend any specific approach for data populators. The specific
112+
design of how a data populator should work will be handled in a separate KEP.
113+
- We do not propose any new object types. Specifically, no special "populator"
114+
object type. Populators should function based on their own object types.
115+
116+
## Proposal
117+
118+
The validation for `DataSource` field should be removed entirely, in the fullness of time.
119+
120+
Short term, we should add a new alpha feature gate, which allows arbitrary objects to be
121+
specified for the `DataSource` field, with the intention to eventually make this behavior
122+
the standard.
123+
124+
Populators will work by responding to PVC objects with a data source they understand,
125+
and producing a PV with the expected data, such that ordinary Kubernetes workflows are
126+
not disrupted. In particular the PVC should be attachable to the end user's pod the
127+
moment it is bound, similar to a PVC created from currently supported data sources.
128+
129+
### User Stories
130+
131+
There are a long list of possible use cases around generic data populators, and I won't
132+
try to detail them all here. I will detail a few that illustrate the challenges faced by
133+
users and developers, but it's important to see these as a few examples among many.
134+
135+
#### VM Images
136+
137+
One use case was relevant to [KubeVirt](https://kubevirt.io/), where the Kubernetes
138+
`Pod` object is actually a
139+
VM running in a hypervisor, and the `PVC` is actually a virtual disk attached to a VM.
140+
It's common for virtualization systems to allows VMs to boot from disks, and for disks
141+
to be pre-populated with various OS images. OS images tend to be stored in external
142+
repositories dedicated to that purpose, often with various mechanisms for retrieving
143+
them efficiently that are external to Kubernetes.
144+
145+
One way to achieve this is to represent disk images as custom resources that point to
146+
the image repository, and to allow creation of PVCs from these custom resources such
147+
that the volumes come pre-populated with the correct data. Efficient population of the
148+
data could be left up to a purpose-built controller that knows how to get the bits
149+
where they need to be with minimal I/O.
150+
151+
#### Backup/Restore
152+
153+
Without getting into the details of how backup/restore should be implemented, it's
154+
clear that whatever design one chooses, a necessary step is to have the user
155+
(or higher level controller) create a PVC that points to the backup they want to
156+
restore, and have the data appear in that volume somehow.
157+
158+
One can imagine backups simply being a special case of snapshots, in which case the
159+
existing design is sufficient, but if you want anything more sophisticated, there
160+
will inevitably be a new object type that represents a backup. While it's arguable
161+
that backup should be something CSI plugins should be exclusively responsible for,
162+
one can also argue that generic backup tools should also exist which can backup
163+
and restore all kind of volumes. Those tools will be apart from CSI plugins and
164+
yet need a way to populate volumes with restored volumes.
165+
166+
It's also likely that multiple backup/restore implementations will be developed,
167+
and it's not a good idea to pick a winner at the Kubernetes API layer. It makes
168+
more sense to enable developers to try different approaches by making the API allow
169+
restoring from various kinds of things.
170+
171+
### Implementation Details/Notes/Constraints
172+
173+
As noted above, the proposal is extremely simple -- just remove the validation on
174+
the `DataSource` field. This raises the question of WHAT will happen when users
175+
put new things in that field, and HOW populators will actually work with so small
176+
a change.
177+
178+
It's first important to note that only consumers of the `DataSource` field today
179+
are the various dynamic provisioners, most notably the external-provisioner CSI
180+
sidecar. If the external-provisioner sidecar sees a data source it doesn't
181+
understand, it simply ignores the request, which is both important for forward
182+
compatibility, and also perfect for the purposes of a data populator. This allows
183+
developers to add new types of data sources that the dynamic provisioners will
184+
simply ignore, enabling a different controller to see these objects and respond
185+
to them.
186+
187+
I will leave the details of how data populators will work for another KEP. There
188+
are a few possible implementation that are worth considering, and this change
189+
is a necessary step to enable prototyping those ideas and deciding which is
190+
the best approach.
191+
192+
### Risks and Mitigations
193+
194+
Clearly there is concern that bad things might happen if we don't restrict
195+
the contents of the `DataSource` field, otherwise the validation wouldn't
196+
have been added. The main risk that I'm aware of is that badly-coded dynamic
197+
provisioners might crash if they see something they don't understand.
198+
Fortunately, the external-provisioner sidecar correctly handles this case,
199+
and so would any other dynamic provisioner designed with forward compatibility
200+
in mind.
201+
202+
Removing validation of the field relinquishes control over what kind of
203+
data sources are okay, and gives developers the freedom to decide. The biggest
204+
problem this leads to is that users might attempt to use a data source that's
205+
not supported (on a particular cluster), and they won't get any feedback
206+
telling them that their request will never succeed. This is not unlike a
207+
situation where a storage class refers to a provisioner that doesn't exist,
208+
but it's something that will need to be solved eventually.
209+
210+
Security issues are hard to measure, because any security issues would be the
211+
result of badly designed data populators that failed to put appropriate
212+
limits on user's actions. Such security issues are going to be present with
213+
any new controller, though, so they don't seem relevant to this change. The
214+
main thing to realize is that the `DataSource` field is a "local" typed
215+
object reference, so no matter what, the object in that field has to either
216+
be in the same namespace as the PVC that references it, or it must be a
217+
non-namespaced object. This seems like an appropriate and desirable
218+
limitation for security reasons.
219+
220+
If we think about who can install populators, the RBAC required for a
221+
populator to operate requires at minimum, the ability to either create or
222+
modify PVs. Also the CRD for the data source type needs to be installed.
223+
This means that populators will generally be installed by cluster admins
224+
or similarly-powerful users, and those users can be expected to understand
225+
the uses and implications of any populators they chose to install.
226+
227+
## Design Details
228+
229+
### Test Plan
230+
231+
The minimal test for this feature is to create a PVC with a data source
232+
that's not a PVC or snapshot, and verify that the data source reference
233+
becomes part of the PVC API object. Any very simple CRD would be okay
234+
for this purpose. We would expect such a PVC to be ignored by existing
235+
dynamic provisioners.
236+
237+
### Graduation Criteria
238+
239+
#### Alpha -> Beta Graduation
240+
241+
- Before going to beta, we need a clear notion of how data populators should
242+
work.
243+
- We will need a simple and lightweight implementation of a data populator
244+
that can be used by the E2E test suite to exercise the functionality.
245+
- Automated tests that create a volume from a data source handled by a
246+
populator, to validate that the data source functionality works, and that
247+
any other required functionality for data population is working.
248+
- We will need to see several implementations of working data populators that
249+
solve real world problems implemented in the community.
250+
251+
#### Beta -> GA Graduation
252+
253+
- Distributions including data populators as part of their distros (possibly
254+
a backup/restore implementation layered on top)
255+
- Allowing time for feedback
256+
257+
### Upgrade / Downgrade Strategy
258+
259+
Data sources are only considered at provisioning time -- once the PVC becomes
260+
bound, the `DataSource` field becomes merely a historical note.
261+
262+
On upgrade, there are no potential problems because this change merely
263+
relaxes an existing limitation.
264+
265+
On downgrade, there is a potential for unbound (not yet provisioned) PVCs to
266+
have data sources that never would have been allowed on the lower version. In
267+
this case we might want to revalidate the field and possibly wipe it out on
268+
downgrade.
269+
270+
### Version Skew Strategy
271+
272+
No issues
273+
274+
## Implementation History
275+
276+
- The idea of data populators has been discussed abstractly in SIG-storage
277+
since 2018 at least.
278+
- John Griffith did the original work to propose something like this, but
279+
that work got scaled down to just PVC clones.
280+
- Ben Swartzlander picked up the populator proposal developed 2 prototypes
281+
in December 2019.
282+
- New KEP proposed January 2020

0 commit comments

Comments
 (0)