1- # Automatic conversion of docker images into thin format
1+ # Automatic conversion of docker images into the thin format
22
33This utility will automatically convert normal docker images into the thin
44format.
55
66## Vocabulary
77
8- There are several concept to keep track in this process, and none of them are
9- very common, so before to dive in we can agree on a share vocabulary.
8+ There are several concepts to keep track in this process, and none of them is
9+ very common, so before to dive in we can agree on a shared vocabulary.
1010
1111** Registry** does refer to the docker image registry, with protocol extensions,
1212common examples are:
1313
1414 * https://registry.hub.docker.com
1515 * https://gitlab-registry.cern.ch
1616
17- ** Repository** This specify a containers of images, each image will be indexed,
17+ ** Repository** This specifies a class of images, each image will be indexed,
1818then by tag or digest. Common examples are:
1919
2020 * library/redis
@@ -26,16 +26,16 @@ and may change in a feature. Common examples are:
2626 * 4
2727 * 3-alpine
2828
29- ** Digest** is another way to identify images inside a repository, digest are
30- ** immutable** , since they are the result of an hash function to the content of
31- the image. Thanks to this technique the images are content addreassable .
29+ ** Digest** is another way to identify images inside a repository, digests are
30+ ** immutable** , since they are the result of a hash function to the content of
31+ the image. Thanks to this technique the images are content addressable .
3232Common examples are:
3333
3434 * sha256:2aa24e8248d5c6483c99b6ce5e905040474c424965ec866f7decd87cb316b541
3535 * sha256:d582aa10c3355604d4133d6ff3530a35571bd95f97aadc5623355e66d92b6d2c
3636
3737
38- An ** image** belong to a repository -- which in turns belongs to a registry --
38+ An ** image** belongs to a repository -- which in turns belongs to a registry --
3939and it is identified by a tag, or a digest or both, if you can choose is always
4040better to identify the image using at least the digest.
4141
@@ -75,7 +75,7 @@ ideally specifying both the tag and the digest.
7575On the other end, you cannot be so specific for the output image, simple
7676because is impossible to know the digest before to generate the image itself.
7777
78- Finally we use model the repository as an append only structure, deleting
78+ Finally we model the repository as an append only structure, deleting
7979layers could break some images actually running.
8080
8181## Commands
@@ -92,7 +92,7 @@ add-desiderata --input-image $INPUT_IMAGE --output-image $OUTPUT_IMAGE --reposit
9292Will add a new ` desiderata ` to the internal database, then it will try to
9393convert the regular image into a thin image.
9494
95- The users are the one that will try tpo log into the registry, you can add
95+ The users are the one that will try to log into the registry, you can add
9696users (so usernames, password and registry) using the ` add-user ` command.
9797
9898### add-image
@@ -128,8 +128,7 @@ migrate-database
128128Apply all the migration to the database up to the newest version of the
129129software.
130130
131- As first run is necessary to run this function and to run it as root since it
132- will create the necessary directory for the database in ` /var/lib/ `
131+ At the first run is necessary to run this function.
133132
134133### download-manifest
135134
@@ -156,7 +155,7 @@ This command will try to convert all the desiderata in the internal database.
156155loop
157156```
158157
159- This command is equivalent to call ` convert ` in an infinite loop, usefull to
158+ This command is equivalent to call ` convert ` in an infinite loop, useful to
160159make sure that all the images are up to date.
161160
162161
@@ -166,7 +165,7 @@ This section will go into the detail of what happens when you try to add a
166165desiderata.
167166
168167The very first step is the parse of both the input and output image, if any of
169- those parse fails the whole command fail and we immediately return an error.
168+ those parse fails the whole command fails and we immediately return an error.
170169
171170Then we check if the desiderata we are trying to add is already in the
172171database, if it is we are not going to add it again and we simply return an
@@ -175,27 +174,27 @@ error.
175174The next step is trying to download the input image manifest, if we are not
176175able to access the input manifest we return an error.
177176
178- Finally if every check completely successfully we add the desiderata to the
177+ Finally if every check completed successfully we add the desiderata to the
179178internal database.
180179
181180## convert workflow
182181
183182The goal of convert is to actually create the thin images starting from the
184- regurlar one.
183+ regular one.
185184
186185In order to convert we iterate for every desiderata.
187186
188- In general some desiderata will be already converted while others will need to
187+ In general, some desiderata will be already converted while others will need to
189188be converted ex-novo.
190189
191190The first step is then to check if the desiderata is already been converted.
192- In order to do this check we download the input image manifest and check
191+ In order to do this check, we download the input image manifest and check
193192against the internal database if the input image digest is already been
194193converted, if it is we can safely skip such conversion.
195194
196- Then, every image is made of different layers, some of them could already been
195+ Then, every image is made of different layers, some of them could already be
197196on the repository.
198- In order to avoid expensive CVMFS transaction, before to downloand and ingest
197+ In order to avoid expensive CVMFS transaction, before to download and ingest
199198the layer we check if it is already in the repository, if it is we do not
200199download nor ingest the layer.
201200
@@ -206,4 +205,113 @@ Such images can be used by docker with the plugins.
206205
207206## General workflow
208207
209- TODO
208+ This section explains how this utility is intended to be used.
209+
210+ Internally this utility invokes ` cvmfs_server ` and ` docker ` commands, so it is
211+ necessary to use it in a stratum0 that also have docker installed.
212+
213+ The docker dependency can be dropped, but it would require some amount of work,
214+ so for this first release, as long as it is not a big hurdle, we are going to
215+ keep it.
216+
217+ The first time the utility is launched is necessary to create the SQLite
218+ database, to do so you can call the command ` migrate-database ` or its alias,
219+ ` init ` .
220+
221+ This command, create as SQLite database called ` docker2cvmfs_archive.sqlite ` ,
222+ the utility will require this file to always be on ` . ` , the directory from
223+ where you are calling the utility itself, this requirements will be dropped in
224+ future releases.
225+
226+ Once the database is been created we can start adding users, images and
227+ desideratas.
228+
229+ The conversion is quite straightforward, we first download the input image, we
230+ store each layer on the cvmfs repository, we create the output image and
231+ finally we upload the output image to the registry.
232+
233+ For downloading an image the credentials can be not necessary, while for
234+ uploading it they are mandatory.
235+
236+ Also, you may want to have different users upload different images to the same
237+ docker registry, maybe even one user for image.
238+
239+ The first step is so to call ` add-user ` .
240+
241+ ```
242+ $ ./daemon init
243+ INFO[0000] Made migrations n=2
244+ $ ./daemon add-user --username foo --password secret --registry docker.foo.bar.com
245+ $ ./daemon list-users
246+ +------+--------------------+
247+ | USER | REGISTRY |
248+ +------+--------------------+
249+ | foo | docker.foo.bar.com |
250+ +------+--------------------+
251+ ```
252+
253+ I wasn't able to figure out a reliable way to get authentication tokens so
254+ to avoid storing the password as clear text in the database, the suggestion at
255+ the moment is to use disposable users with very limited capabilities so that
256+ if the database get compromised (a third party has access to it) we are able to
257+ limit the treats.
258+
259+ The next step is to add a desiderata, to do so:
260+
261+ ```
262+ $ ./daemon add-desiderata \
263+ --input-image https://registry.hub.docker.com/library/redis:4 \
264+ --output-image https://gitlab-registry.cern.ch/smosciat/containerd/thin/redis:4 \
265+ --repository cd.cern.ch \
266+ --user-output smosciat
267+ WARN[0000] Unable to retrieve the password, trying to get the manifest anonymously. error="sql: no rows in result set"
268+ Auth to: Bearer realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull"
269+ https://auth.docker.io/token?scope=repository%3Alibrary%2Fredis%3Apull&service=registry.docker.io
270+
271+ $ ./daemon list-desideratas
272+ +----+----------------+-------------------------------------------------+------------+-----------------+------------------------------------------------------------------+
273+ | ID | INPUT IMAGE ID | INPUT IMAGE NAME | CVMFS REPO | OUTPUT IMAGE ID | OUTPUT IMAGE NAME |
274+ +----+----------------+-------------------------------------------------+------------+-----------------+------------------------------------------------------------------+
275+ | 1 | 1 | https://registry.hub.docker.com/library/redis:4 | cd.cern.ch | 2 | https://gitlab-registry.cern.ch/smosciat/containerd/thin/redis:4 |
276+ +----+----------------+-------------------------------------------------+------------+-----------------+------------------------------------------------------------------+
277+ ```
278+
279+ Of ocurse you can add as many desideratas as you wish.
280+
281+ Now that all the desideratas are in place you can simply start converting them:
282+
283+ ```
284+ $ ./daemon convert
285+ ```
286+
287+ The above command should provide enough logs to be able to infer what is
288+ happening and to debug any error.
289+
290+ Make sure that the user is able to start a cvmfs transaction and that is able
291+ to communicate with docker, anyway this errors should be pretty self evidentds
292+ in the logs.
293+
294+ The above command is quite cheap, it avoids to convert an images that is
295+ already been converted and it avoid to download layers that are already been
296+ downloaded, command line flags can change this behaviour if necessary.
297+
298+ You may want to keep the above command running in a loop, hence it will
299+ automatically pick up changes in the input images and start the conversion.
300+
301+ We are basically polling the registries for changings in the input image, again
302+ there was not a reliable and easy way to get updates from the registry, not
303+ even from the one inside CERN that we manage.
304+
305+ In order to run the conversion in a loop you can simply use:
306+
307+ ```
308+ $./daemon loop
309+ ```
310+
311+ While the daemon is running in a loop you should be able to iteract with the
312+ utility without any issue, so you should be able to add users, images and even
313+ desideratas.
314+
315+ Only be careful to don't leave the CVMFS repository in an inconsistet state
316+ (abort the program Ctrl-C while it is doing a transaction).
317+
0 commit comments