Skip to content

Commit d44d70b

Browse files
authored
Merge pull request #45 from vsoch/master
Updates to singularity python to support advanced shub functions
2 parents 7a985e4 + 5ae8b60 commit d44d70b

File tree

191 files changed

+1178
-378
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

191 files changed

+1178
-378
lines changed

MANIFEST.in

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
recursive-include singularity/templates *
22
recursive-include singularity/static *
33
recursive-include singularity/build *
4+
recursive-include singularity/analysis *
5+
recursive-include singularity/views *
46
recursive-include singularity/testing *

README.md

Lines changed: 78 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -32,62 +32,70 @@ After installation, you should be able to run `shub` on the command line, withou
3232

3333
$ shub --help
3434
usage: shub [-h] [--image IMAGE] [--images IMAGES] [--debug]
35-
[--outfolder OUTFOLDER] [--package] [--tree] [--simtree]
36-
[--subtract] [--simcalc] [--size SIZE]
35+
[--outfolder OUTFOLDER] [--package] [--os] [--oscalc] [--tags]
36+
[--tree] [--simtree] [--subtract] [--simcalc] [--size SIZE]
3737

3838
Singularity Hub command line tool
3939

4040
optional arguments:
4141
-h, --help show this help message and exit
4242
--image IMAGE full path to singularity image (for use with --package
4343
and --tree)
44-
--images IMAGES images, separated by commas (for use with --simtree)
44+
--images IMAGES images, separated by commas (for use with --simtree
45+
and --subtract
4546
--debug use verbose logging to debug.
4647
--outfolder OUTFOLDER
4748
full path to folder for output, stays in tmp (or pwd)
4849
if not specified
4950
--package package a singularity container for singularity hub
51+
--os estimate the operating system of your container.
52+
--oscalc calculate similarity score for your container vs.
53+
docker library OS.
54+
--tags retrieve list of software tags for an image, itself
55+
minus it's base
5056
--tree view the guts of an singularity image (use --image)
5157
--simtree view common guts between two images (use --images)
52-
--subtract subtract one container image from the second to make
53-
a difference tree (use --images first,subtract)
58+
--subtract subtract one container image from the second to make a
59+
difference tree (use --images first,subtract)
5460
--simcalc calculate similarity (number) between images based on
5561
file contents.
5662
--size SIZE If using Docker or shub image, you can change size
5763
(default is 1024)
5864

5965

6066

61-
### Package your container
67+
### Classify your container
68+
Singularity python provides functions for quickly assessing the base operating system of your container, retrieving a list of software tags that are relevant when this base is subtracted, and getting similarity scores of your container to a library of base software.
6269

63-
A package is a zipped up file that contains the image, the singularity runscript as `runscript`, a `VERSION` file, and a list of files `files.txt` and folders `folders.txt` in the container.
70+
#### Estimate the OS
6471

65-
![img/singularity-package.png](img/singularity-package.png)
72+
You can do this on the command line as follows:
6673

67-
The example package can be [downloaded for inspection](http://www.vbmis.com/bmi/project/singularity/package_image/ubuntu:latest-2016-04-06.img.zip), as can the [image used to create it](http://www.vbmis.com/bmi/project/singularity/package_image/ubuntu:latest-2016-04-06.img). This is one of the drivers underlying [singularity hub](http://www.singularity-hub.org) (under development).
74+
shub --image docker://python:latest --os
75+
[sudo] password for vanessa
76+
Most similar OS found to be debian:7.11
77+
debian:7.11
6878

69-
- **files.txt** and **folders.txt**: are simple text file lists with paths in the container, and this choice is currently done to provide the rawest form of the container contents. These files also are used to generate interactive visualizations, and calculate similarity between containers.
70-
- **VERSION**: is a text file with one line, an md5 hash generated for the image when it was packaged.
71-
- **{{image}}.img**: is of course the original singularity container (usually a .img file)
79+
or to do this from within Python, see the [provided example](examples/classify_image/estimate_os.py). From within python, you can export the sudopw as the environmental variable "pancakes" and it won't need to ask. This is not ideal, but it's required for now since we are using Singularity to export the image. This will likely be changed soon.
7280

73-
First, go to where you have some images:
7481

75-
ls
76-
ubuntu.img
77-
82+
#### Get software tags
83+
Singularity Hub uses a simple algorithm to obtain a likely list of software that is important to your image. It assumes that (most) core installed software is in a folder called `bin`, and returns the list of these files with the estimated base image subtracted. You can do this as follows:
7884

79-
You can now use the `shub` command line tool to package your image. Note that you must have [singularity installed](https://singularityware.lbl.gov/install-linux), and depending on the function you use, you will likely need to use sudo. We can use the `--package` argument to package our image:
8085

81-
shub --image ubuntu.img --package
86+
shub --image docker://python:latest --tags
87+
8288

89+
We also provide an [example for Python](examples/classify_image/derive_tags.py). If you do this programatically, you can change the folder(s) that are included, meaning that you could get a custom list of software (eg, libraries in `lib`, or python packages in `site-packages`).
8390

84-
If no output folder is specified, the resulting image (named in the format `ubuntu.img.zip` will be output in the present working directory. You can also specify an output folder:
8591

86-
shub --image ubuntu.img --package --outfolder /tmp
92+
#### Compare to base OS
93+
If you want to get a complete list of scores for your image against a core set of latest [docker-os](singularity/analysis/packages/docker-os) images:
8794

88-
For the package command, you will need to put in your password to grant sudo priviledges, as packaging requires using the singularity `export` functionality.
95+
shub --image docker://python:latest --oscalc
8996

90-
For more details, and a walkthrough with sample data, please see [examples/package_image](examples/package_image)
97+
98+
or again see [this example](examples/classify_image/estimate_os.py) for doing this from within python.
9199

92100

93101
### View the inside of a container
@@ -114,6 +122,14 @@ An [interactive demo](https://singularityware.github.io/singularity-python/examp
114122

115123
### Visualize Containers
116124

125+
#### Container Similarity Clustering
126+
Do you have sets of containers or packages, and want to cluster them based on similarities?
127+
128+
![examples/package_tree/docker-os.png](examples/package_tree/docker-os.png)
129+
130+
We have examples for both deriving scores and producing plots like the above, see [examples/package_tree](examples/package_tree)
131+
132+
117133
#### Container Similarity Tree
118134

119135
![examples/similar_tree/simtree.png](examples/similar_tree/simtree.png)
@@ -146,7 +162,7 @@ An [interactive demo](https://singularityware.github.io/singularity-python/examp
146162
What files and folders differ between two containers? What does it look like if I subtract one image from the second? `shub` provides a command line tool to generate a visualization to do exactly this.
147163

148164

149-
shub --subtract --images docker://ubuntu:latest.docker://centos:latest
165+
shub --subtract --images docker://ubuntu:latest,docker://centos:latest
150166

151167
As with `simtree`, this function supports both docker and singularity images as inputs.
152168

@@ -159,7 +175,7 @@ An [interactive demo](https://singularityware.github.io/singularity-python/examp
159175
The same functions above can be used to show the exact similarities (intersect) and differences (files and/or folders unique to two images) between two images. You can get a data structure with this information as follows:
160176

161177

162-
from shub.views import compare_containers
178+
from singularity.analysis.compare import compare_containers
163179

164180
image1 = 'ubuntu.img'
165181
image2 = 'centos.img'
@@ -168,6 +184,12 @@ The same functions above can be used to show the exact similarities (intersect)
168184
comparison = compare_containers(image1,image2,by=by)
169185

170186

187+
Note that you can also compare packages, or a container to a package:
188+
189+
190+
def compare_containers(container1=None,container2=None,by=None,
191+
image_package1=None,image_package2=None)
192+
171193

172194
#### Calculate similarity of images
173195

@@ -179,6 +201,38 @@ and the same applies for specification of Docker images, as in the previous exam
179201

180202

181203

204+
### Package your container
205+
The driver of much of the above is the simple container package. A package is a zipped up file that contains the image, the singularity runscript as `runscript`, a `VERSION` file, and a list of files `files.txt` and folders `folders.txt` in the container.
206+
207+
![img/singularity-package.png](img/singularity-package.png)
208+
209+
The example package can be [downloaded for inspection](http://www.vbmis.com/bmi/project/singularity/package_image/ubuntu:latest-2016-04-06.img.zip), as can the [image used to create it](http://www.vbmis.com/bmi/project/singularity/package_image/ubuntu:latest-2016-04-06.img). This is one of the drivers underlying [singularity hub](http://www.singularity-hub.org) (under development).
210+
211+
- **files.txt** and **folders.txt**: are simple text file lists with paths in the container, and this choice is currently done to provide the rawest form of the container contents. These files also are used to generate interactive visualizations, and calculate similarity between containers.
212+
- **VERSION**: is a text file with one line, an md5 hash generated for the image when it was packaged.
213+
- **{{image}}.img**: is of course the original singularity container (usually a .img file)
214+
215+
First, go to where you have some images:
216+
217+
ls
218+
ubuntu.img
219+
220+
221+
You can now use the `shub` command line tool to package your image. Note that you must have [singularity installed](https://singularityware.lbl.gov/install-linux), and depending on the function you use, you will likely need to use sudo. We can use the `--package` argument to package our image:
222+
223+
shub --image ubuntu.img --package
224+
225+
226+
If no output folder is specified, the resulting image (named in the format `ubuntu.img.zip` will be output in the present working directory. You can also specify an output folder:
227+
228+
shub --image ubuntu.img --package --outfolder /tmp
229+
230+
For the package command, you will need to put in your password to grant sudo priviledges, as packaging requires using the singularity `export` functionality.
231+
232+
For more details, and a walkthrough with sample data, please see [examples/package_image](examples/package_image)
233+
234+
235+
182236
### Build your container
183237
More information coming soon.
184238

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/usr/bin/env python
2+
3+
# This is an example of counting files using an image diff
4+
5+
from singularity.analysis.classify import (
6+
get_diff,
7+
file_counts,
8+
extension_counts
9+
)
10+
11+
image_package = "python:3.6.0.img.zip"
12+
13+
# The diff is a dict of folders --> files that differ between
14+
# image and it's closest OS
15+
diff = get_diff(image_package=image_package)
16+
17+
# Now we might be interested in counting different things
18+
readme_count = file_counts(diff=diff)
19+
copyright_count = file_counts(diff=diff,patterns=['copyright'])
20+
authors_count = file_counts(diff=diff,patterns=['authors','thanks','credit'])
21+
todo_count = file_counts(diff=diff,patterns=['todo'])
22+
23+
# Or getting a complete dict of extensions
24+
extensions = extension_counts(diff=diff)
25+
26+
# Return files instead of counts
27+
extensions = extension_counts(diff=diff,return_counts=False)
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#!/usr/bin/env python
2+
3+
# This is an example of generating image packages from within python
4+
5+
import os
6+
os.environ['MESSAGELEVEL'] = 'CRITICAL'
7+
8+
from singularity.analysis.classify import get_tags
9+
10+
image_package = "python:3.6.0.img.zip"
11+
12+
# The algorithm works as follows:
13+
# 1) first compare package to set of base OS (provided with shub)
14+
# 2) subtract the most similar os from image, leaving "custom" files
15+
# 3) organize custom files into dict based on folder name
16+
# 4) return search_folders as tags
17+
18+
# Default tags will be returned as software in "bin"
19+
tags = get_tags(image_package=image_package)
20+
21+
# Most similar OS found to be %s debian:7.11
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#!/usr/bin/env python
2+
3+
# This is an example of generating image packages from within python
4+
5+
from singularity.analysis.classify import estimate_os
6+
7+
image_package = "python:3.6.0.img.zip"
8+
9+
# We can obtain the estimated os (top match)
10+
estimated_os = estimate_os(image_package=image_package)
11+
# Most similar OS found to be %s debian:7.11
12+
13+
# We can also get the whole list and values
14+
os_similarity = estimate_os(image_package=image_package,return_top=False)
134 KB
Binary file not shown.
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
#!/usr/bin/env python
2+
3+
# This is an example of generating image packages from within python. The docker image packages (top) will
4+
# retrieve all of docker official library (as of 1/2017), and the os specific images (bottom) will
5+
# retrieve base os (both included with singularity-python)
6+
7+
import os
8+
import shutil
9+
from singularity.cli import get_image
10+
from singularity.package import package
11+
12+
# Save packages here
13+
output_base = '/home/vanessa/Documents/Dropbox/Code/singularity/singularity-python/examples/package_image/packages'
14+
15+
## DOCKER LIBBRARY IMAGES
16+
# https://github.com/docker-library/official-images/tree/master/library
17+
images = ['aerospike:3.10.1.1','alpine:3.5','amazonlinux:2016.09','arangodb:3.1.7','backdrop:1.5.2',
18+
'bash:4.4.5','bonita:7.3.3','buildpack-deps:jessie',
19+
'busybox:1.26.1-glibc','cassandra:3.9','celery:3.1.3','centos:7',
20+
'chronograf:0.13.0','cirros:0.3.4','clearlinux:base','clojure:lein-2.7.1',
21+
'composer:1.3.0','consul:0.7.2','couchbase:enterprise-4.5.1','couchdb:1.6.1','crate:1.0.1',
22+
'debian:8.6','docker:1.13.0-rc5','drupal:8.2.5-apache','eclipse-mosquitto:1.4.10','eggdrop:1.8.0',
23+
'elasticsearch:5.1.1','elixir:1.4.0','erlang:19.2','fedora:25','fsharp:4.0.0.4','gazebo:libgazebo7',
24+
'gcc:4.9.4','ghost:0.11.3','golang:1.6.4','haproxy:1.7.1','haskell:8.0.1','hello-world:latest',
25+
'httpd:2.2.31','hylang:0.11.1','ibmjava:8-jre','influxdb:1.1.1','iojs:3.3.0','irssi:1.0.0',
26+
'jenkins:2.32.1','jetty:9.3.15','joomla:3.6.5','jruby:9.1.6.0-jre','julia:0.5.0',
27+
'kaazing-gateway:5.3.2','kapacitor:1.1.1','kibana:5.1.1','known:0.9.2','kong:0.9.7','lightstreamer:6.0.3',
28+
'logstash:5.1.1','mageia:5','mariadb:10.1.20','maven:3.3.9-jdk-8','memcached:1.4.33','mongo:3.4.1',
29+
'mongo-express:0.34.0','mono:4.6.2.16','mysql:5.7.17','nats:0.9.6',
30+
'nats-streaming:0.3.6','neo4j:3.1.0','neurodebian:nd70','nextcloud:10.0.0','nginx:1.11.8',
31+
'node:7.4.0','notary:server-0.5.0','notary:signer-0.5.0','nuxeo-LTS-2016','odoo:10.0','openjdk:8u111-jdk',
32+
'opensuse:42.2','oraclelinux:7.3','orientdb:2.2.14','owncloud:9.1.3-apache','percona:5.7.16',
33+
'perl:5.24.0','photon:1.0','php:7.1.0-cli','php-zendserver:9.0.1-php7',
34+
'piwik:3.0.0','plone:5.0.6','postgres:9.6.1','pypy:3-5.5.0-alpha','python:3.6.0',
35+
'r-base:3.3.2','rabbitmq:3.6.6','rakudo-star:2016.11','redis:3.0.7','redmine:3.1.7','registry:2.5.1',
36+
'rethinkdb:2.3.5','rocket.chat:0.48.2','ros:kinetic-ros-base','ruby:2.4.0','sentry:8.12.0',
37+
'solr:6.3.0','sonarqube:6.2','sourcemage:0.62','spiped:1.5.0','storm:1.0.2','swarm:1.2.5',
38+
'telegraf:1.1.2','thrift:0.9.3','tomcat:8.0.39-jre7',
39+
'tomee:8-jdk-7.0.1-webprofile','traefik:v1.1.2','ubuntu:16.04','vault:0.6.4',
40+
'websphere-liberty:javaee7','wordpress:4.7.0','zookeeper:3.4.9']
41+
42+
image_type = 'docker'
43+
44+
45+
## OS IMAGES (provided via official docker library)
46+
images = ['alpine:3.1','alpine:3.2','alpine:3.3','alpine:3.4','alpine:3.5',
47+
'busybox:1.26.1-glibc','busybox:1.26.1-musl','busybox:1.26.1-uclibc',
48+
'amazonlinux:2016:09',
49+
'centos:7','centos:6','centos:5',
50+
'cirros:0.3,4','cirros:0.3,3',
51+
'clearlinux:latest',
52+
'crux:3.1',
53+
'debian:8.6','debian:sid','debian:stretch','debian:7.11', # 8.6 is jessie, 7.11 is wheezy
54+
'fedora:25','fedora:24','fedora:23','fedora:22','fedora:21','fedora:20',
55+
'mageia:5',
56+
'opensuse:42.2','opensuse:42.1','opensuse:13.2','opensuse:tumbleweed' # 42.2 is leap, 13.2 harlequin
57+
'oraclelinux:7.3','oraclelinux:7.2','oraclelinux:7.1','oraclelinux:7.0','oraclelinux:6.8',
58+
'oraclelinux:6.7','oraclelinux:6.6','oraclelinux:5.11',
59+
'photon:1.0',
60+
'sourcemage:0.62',
61+
'swarm:1.2.6-rc1',
62+
'ubuntu:12.04.5','ubuntu:14.04.5','ubuntu:16.04','ubuntu:16.10','ubuntu:17.04']
63+
64+
image_type = 'os'
65+
66+
# You will need to export your sudopw to an environment variable called pancakes for it to not ask you :)
67+
os.environ['pancakes'] = 'yoursecretpass'
68+
69+
# We will make subdirectories in package folder
70+
output_folder = "%s/%s" %(output_base,image_type)
71+
72+
if not os.path.exists(output_folder):
73+
os.mkdir(output_folder)
74+
75+
for name in images:
76+
docker_image = "docker://%s" %(name)
77+
image = get_image(docker_image)
78+
package_name = "%s/%s.zip" %(output_folder,os.path.basename(image))
79+
if not os.path.exists(package_name):
80+
image_package = package(image_path=image,
81+
output_folder=output_folder,
82+
remove_image=True,
83+
runscript=True,
84+
software=True)
85+
tmpfolder = os.path.dirname(image)
86+
shutil.rmtree(tmpfolder)
87+

0 commit comments

Comments
 (0)