Skip to content

Commit d08f112

Browse files
committed
adding function to visualize os sims to close #46
1 parent ed64915 commit d08f112

File tree

11 files changed

+289
-33
lines changed

11 files changed

+289
-33
lines changed

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,8 @@ After installation, you should be able to run `shub` on the command line, withou
5151
--os estimate the operating system of your container.
5252
--oscalc calculate similarity score for your container vs.
5353
docker library OS.
54+
--osplot plot similarity scores for your container vs. docker
55+
library OS.
5456
--tags retrieve list of software tags for an image, itself
5557
minus it's base
5658
--tree view the guts of an singularity image (use --image)
@@ -94,9 +96,13 @@ If you want to get a complete list of scores for your image against a core set o
9496

9597
shub --image docker://python:latest --oscalc
9698

97-
9899
or again see [this example](examples/classify_image/estimate_os.py) for doing this from within python.
99100

101+
You can also generate a [dynamic plot](https://singularityware.github.io/singularity-python/examples/classify_image/) for this data:
102+
103+
shub --image docker://python:latest --osplot
104+
105+
100106

101107
### View the inside of a container
102108
What's inside that container? Right now, the main way to answer this question is to do some equivalent of ssh. shub provides a command line function for rendering a view to (immediately) show the contents of an image (folders and files) in your web browser. **Important** the browser will open, but you will need to use your password to use Singularity on the command line:

examples/classify_image/derive_tags.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,10 @@
55
import os
66
os.environ['MESSAGELEVEL'] = 'CRITICAL'
77

8-
from singularity.analysis.classify import get_tags
8+
from singularity.analysis.classify import (
9+
get_tags,
10+
get_diff
11+
)
912

1013
image_package = "python:3.6.0.img.zip"
1114

@@ -18,4 +21,12 @@
1821
# Default tags will be returned as software in "bin"
1922
tags = get_tags(image_package=image_package)
2023

24+
# We can also get the raw "diff" between the image and it's base
25+
# which is usable in other functions (and we don't have to calc
26+
# it again)
27+
diff = get_diff(image_package=image_package)
28+
29+
# We can specify other folders of interest
30+
folders = ['init','init.d','bin','systemd']
31+
tags = get_tags(search_folders=folders,diff=diff)
2132
# Most similar OS found to be %s debian:7.11

examples/classify_image/index.html

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
<meta charset="utf-8">
2+
<style> /* set the CSS */
3+
4+
.bar { fill: steelblue; }
5+
6+
body {
7+
font: 10px sans-serif;
8+
}
9+
10+
.axis path,
11+
.axis line {
12+
fill: none;
13+
stroke: #000;
14+
shape-rendering: crispEdges;
15+
}
16+
17+
.bar {
18+
fill: orange;
19+
}
20+
21+
.bar:hover {
22+
fill: orangered ;
23+
}
24+
25+
.x.axis path {
26+
display: none;
27+
}
28+
29+
.d3-tip {
30+
line-height: 1;
31+
padding: 6px;
32+
background: rgba(0, 0, 0, 0.8);
33+
color: #fff;
34+
border-radius: 4px;
35+
font-size: 12px;
36+
}
37+
38+
/* Creates a small triangle extender for the tooltip */
39+
.d3-tip:after {
40+
box-sizing: border-box;
41+
display: inline;
42+
font-size: 10px;
43+
width: 100%;
44+
line-height: 1;
45+
color: rgba(0, 0, 0, 0.8);
46+
content: "\25BC";
47+
position: absolute;
48+
text-align: center;
49+
}
50+
51+
/* Style northward tooltips specifically */
52+
.d3-tip.n:after {
53+
margin: -2px 0 0 0;
54+
top: 100%;
55+
left: 0;
56+
}
57+
</style>
58+
<body>
59+
<h2>python:latest Similarity to Docker OS</h2>
60+
<!-- load the d3.js library -->
61+
<script src="//d3js.org/d3.v4.min.js"></script>
62+
<script src="https://rawgit.com/VACLab/d3-tip/master/d3-tip.js"></script>
63+
<script>
64+
65+
// set the dimensions and margins of the graph
66+
var margin = {top: 20, right: 20, bottom: 30, left: 40},
67+
width = 960 - margin.left - margin.right,
68+
height = 500 - margin.top - margin.bottom,
69+
axis_height = height - 100;
70+
71+
// set the ranges
72+
var x = d3.scaleBand()
73+
.range([0, width])
74+
.padding(0.1);
75+
76+
var y = d3.scaleLinear()
77+
.range([height, 0]);
78+
79+
// Setup the tool tip. Note that this is just one example, and that many styling options are available.
80+
// See original documentation for more details on styling: http://labratrevenge.com/d3-tip/
81+
var tool_tip = d3.tip()
82+
.attr("class", "d3-tip")
83+
.offset([-8, 0])
84+
.html(function(d) { return "Similarity Score: " + d.score; });
85+
86+
// acppend the svg object to the body of the page
87+
// append a 'group' element to 'svg'
88+
// moves the 'group' element to the top left margin
89+
var svg = d3.select("body").append("svg")
90+
.attr("width", width + margin.left + margin.right)
91+
.attr("height", height + margin.top + margin.bottom + 200)
92+
.append("g")
93+
.attr("transform",
94+
"translate(" + margin.left + "," + margin.top + ")");
95+
96+
svg.call(tool_tip);
97+
raw = {'oraclelinux:6.7': 0.11972789115646258, 'debian:8.6': 0.42616903547792123, 'cirros:0.3,4': 0.0, 'ubuntu:16.10': 0.2969308157760443, 'sourcemage:0.62': 0.07611631138729384, 'busybox:1.26.1-uclibc': 0.0011023839051949843, 'oraclelinux:7.2': 0.11877551020408163, 'ubuntu:14.04.5': 0.2969308157760443, 'fedora:24': 0.11368015414258188, 'ubuntu:12.04.5': 0.2969308157760443, 'alpine:3.1': 0.002474907190980338, 'mageia:5': 0.07694284350794758, 'alpine:3.5': 0.002474907190980338, 'photon:1.0': 0.045019905495404994, 'cirros:0.3,3': 0.0, 'fedora:22': 0.11368015414258188, 'clearlinux:latest': 0.026932331461859936, 'centos:7': 0.11905674178828939, 'crux:3.1': 0.11909443289702826, 'debian:stretch': 0.42616903547792123, 'fedora:20': 0.11368015414258188, 'centos:5': 0.1213556724372148, 'oraclelinux:6.6': 0.11952380952380952, 'busybox:1.26.1-glibc': 0.0011942492306278995, 'oraclelinux:7.1': 0.11884353741496599, 'alpine:3.2': 0.002474907190980338, 'oraclelinux:7.0': 0.11850340136054421, 'swarm:1.2.6-rc1': 9.190754101374018e-05, 'amazonlinux:2016:09': 0.1416462744263574, 'opensuse:42.1': 0.0306866436984216, 'ubuntu:16.04': 0.2969308157760443, 'fedora:25': 0.11368015414258188, 'debian:sid': 0.4263136957072077, 'oraclelinux:6.8': 0.12061224489795919, 'opensuse:42.2': 0.0306866436984216, 'opensuse:13.2': 0.0306866436984216, 'ubuntu:17.04': 0.2969308157760443, 'centos:6': 0.12149500156745272, 'oraclelinux:5.11': 0.12027210884353741, 'debian:7.11': 0.4263136957072077, 'opensuse:tumbleweedoraclelinux:7.3': 0.0306866436984216, 'fedora:23': 0.11368015414258188, 'alpine:3.3': 0.002474907190980338, 'fedora:21': 0.11368015414258188, 'busybox:1.26.1-musl': 0.0011942492306278995, 'alpine:3.4': 0.002474907190980338};
98+
data = [];
99+
100+
// Format the data
101+
for (var os in raw) {
102+
if (raw.hasOwnProperty(os)) {
103+
var new_os = {dist:os, score: parseFloat(raw[os]).toFixed(8)};
104+
data.push(new_os);
105+
}
106+
}
107+
108+
// Scale the range of the data in the domains
109+
x.domain(data.map(function(d) { return d.dist; }));
110+
y.domain([0.0, 1.0]);
111+
112+
// append the rectangles for the bar chart
113+
svg.selectAll(".bar")
114+
.data(data)
115+
.enter().append("rect")
116+
.attr("class", "bar")
117+
.attr("x", function(d) { return x(d.dist); })
118+
.attr("width", 15)
119+
.attr("y", function(d) { return y(d.score); })
120+
.attr("height", function(d) { return height - y(d.score); })
121+
.on('mouseover', tool_tip.show)
122+
.on('mouseout', tool_tip.hide)
123+
124+
// add the x Axis
125+
svg.append("g")
126+
.classed('axis',true)
127+
.attr("transform", "translate(0," + height + ")")
128+
.call(d3.axisBottom(x))
129+
.selectAll("text")
130+
.style("text-anchor", "end")
131+
.attr("dx", "-.8em")
132+
.attr("dy", ".15em")
133+
.attr("transform", "rotate(-65)");
134+
135+
// add the y Axis
136+
svg.append("g")
137+
.call(d3.axisLeft(y));
138+
139+
140+
</script>
141+
</body>

examples/package_tree/calculate_similarity.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@
5959

6060
# Use your own input arguments...
6161
comparisons = compare_packages(packages_set1=package_set1,
62-
packages_set2=packge_set2,
62+
packages_set2=package_set2,
6363
by="folders.txt")
6464

6565
# Or use defaults

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
name="singularity",
88

99
# Version number:
10-
version="0.77",
10+
version="0.82",
1111

1212
# Application author details:
1313
author="Vanessa Sochat",

singularity/analysis/classify.py

Lines changed: 25 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,10 @@
1414
from singularity.logman import bot
1515
from singularity.analysis.compare import (
1616
compare_packages,
17-
compare_containers
17+
compare_containers,
18+
container_similarity_vector
1819
)
20+
1921
from singularity.analysis.utils import get_package_base
2022
from singularity.package import package as make_package
2123
from singularity.utils import (
@@ -56,11 +58,9 @@ def get_diff(container=None,image_package=None,sudopw=None):
5658
3) organize custom files into dict based on folder name
5759
5860
'''
59-
if image_package == None:
60-
image_package = make_package(container,remove_image=True,sudopw=sudopw)
6161

6262
# Find the most similar os
63-
most_similar = estimate_os(image_package=image_package,sudopw=sudopw)
63+
most_similar = estimate_os(image_package=image_package,container=container,sudopw=sudopw)
6464
similar_package = "%s/docker-os/%s.img.zip" %(get_package_base(),most_similar)
6565

6666
comparison = compare_containers(image_package1=image_package,
@@ -87,7 +87,7 @@ def get_diff(container=None,image_package=None,sudopw=None):
8787

8888

8989
###################################################################################
90-
# TAGGING #########################################################################
90+
# OPERATING SYSTEMS ###############################################################
9191
###################################################################################
9292

9393

@@ -96,11 +96,22 @@ def estimate_os(container=None,image_package=None,sudopw=None,return_top=True):
9696
operating system images, and return the docker image most similar
9797
:param return_top: return only the most similar (estimated os) default True
9898
:param image_package: the package created from the image to estimate.
99+
FIGURE OUT WHAT DATA WE NEED
99100
'''
100101
if image_package == None:
101-
image_package = make_package(container,remove_image=True,sudopw=sudopw)
102-
103-
comparison = compare_packages(packages_set1=[image_package])['files.txt'].transpose()
102+
103+
SINGULARITY_HUB = os.environ.get('SINGULARITY_HUB',"False")
104+
105+
# Visualization deployed local or elsewhere
106+
if SINGULARITY_HUB == "False":
107+
image_package = make_package(container,remove_image=True,sudopw=sudopw)
108+
comparison = compare_packages(packages_set1=[image_package])['files.txt'].transpose()
109+
else:
110+
comparison = container_similarity_vector(container1=container)['files.txt'].transpose()
111+
112+
else:
113+
comparison = compare_packages(packages_set1=[image_package])['files.txt'].transpose()
114+
104115
comparison.columns = ['SCORE']
105116
most_similar = comparison['SCORE'].idxmax()
106117
print("Most similar OS found to be ", most_similar)
@@ -109,6 +120,12 @@ def estimate_os(container=None,image_package=None,sudopw=None,return_top=True):
109120
return comparison
110121

111122

123+
124+
###################################################################################
125+
# TAGGING #########################################################################
126+
###################################################################################
127+
128+
112129
def get_tags(container=None,image_package=None,sudopw=None,search_folders=None,diff=None,
113130
return_unique=True):
114131
'''get tags will return a list of tags that describe the software in an image,

singularity/analysis/compare.py

Lines changed: 47 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,49 @@
3232
# CONTAINER COMPARISONS ###########################################################
3333
###################################################################################
3434

35+
def container_similarity_vector(container1=None,packages_set=None,by=None,custom_set=None):
36+
'''container similarity_vector is similar to compare_packages, but intended
37+
to compare a container object (singularity image or singularity hub container)
38+
to a list of packages. If packages_set is not provided, the default used is
39+
'docker-os'. This can be changed to 'docker-library', or if the user wants a custom
40+
list, should define custom_set.
41+
:param container1: singularity image or singularity hub container.
42+
:param packages_set: a name of a package set, provided are docker-os and docker-library
43+
:param custom_set: a list of package files, used first if provided.
44+
:by: metrics to compare by (files.txt and or folders.txt)
45+
'''
46+
if custom_set == None:
47+
if packages_set == None:
48+
packages_set = get_packages('docker-os')
49+
else:
50+
packages_set = custom_set
51+
52+
if by == None:
53+
by = ['files.txt']
54+
55+
if not isinstance(by,list):
56+
by = [by]
57+
if not isinstance(packages_set,list):
58+
packages_set = [packages_set]
59+
60+
comparisons = dict()
61+
62+
for b in by:
63+
bot.logger.debug("Starting comparisons for %s",b)
64+
df = pandas.DataFrame(columns=packages_set)
65+
for package1 in packages_set:
66+
sim = calculate_similarity(container1=container1,
67+
image_package2=package1,
68+
by=b)[b]
69+
70+
name1 = os.path.basename(package1).replace('.img.zip','')
71+
bot.logger.debug("container vs. %s: %s" %(name1,sim))
72+
df.loc["container",package2] = sim
73+
df.columns = [os.path.basename(x).replace('.img.zip','') for x in df.columns.tolist()]
74+
comparisons[b] = df
75+
return comparisons
76+
77+
3578
def compare_containers(container1=None,container2=None,by=None,
3679
image_package1=None,image_package2=None):
3780
'''compare_containers will generate a data structure with common and unique files to
@@ -48,7 +91,7 @@ def compare_containers(container1=None,container2=None,by=None,
4891
by = ["files.txt"]
4992
if not isinstance(by,list):
5093
by = [by]
51-
94+
5295
# Get files and folders for each
5396
container1_guts = get_container_contents(gets=by,
5497
split_delim="\n",
@@ -111,11 +154,12 @@ def calculate_similarity(container1=None,container2=None,image_package1=None,
111154
# PACKAGE COMPARISONS #############################################################
112155
###################################################################################
113156

114-
115157
def compare_packages(packages_set1=None,packages_set2=None,by=None):
116158
'''compare_packages will compare one image or package to one image or package. If
117159
the folder isn't specified, the default singularity packages (included with install)
118-
will be used (os vs. docker library)
160+
will be used (os vs. docker library). Images will take preference over packages
161+
:param packages_set1: a list of package files not defined uses docker-library
162+
:param packages_set2: a list of package files, not defined uses docker-os
119163
:by: metrics to compare by (files.txt and or folders.txt)
120164
'''
121165
if packages_set1 == None:

0 commit comments

Comments
 (0)