- First cut travis file (#36)

alanbchristie · Alan Christie · web-flow · commit f2eb2cfecc42 · 2020-03-11T13:22:15.000Z
- Adds build badges

Co-authored-by: Alan Christie &lt;alan.christie@matildapeak.com&gt;
diff --git a/.travis.yml b/.travis.yml
@@ -1,11 +1,72 @@
-os:
-- linux
+---
 
-language: python
+# -----------------
+# Control variables (Travis Settings)
+# -----------------
+#
+# PUBLISH_IMAGES        Should be 'yes' to enable publishing to Docker Hub.
+#
+# If you set PUBLISH_IMAGES you must also set the following: -
+#
+# DOCKER_USERNAME       If PUBLISH_IMAGES is 'yes'
+# DOCKER_PASSWORD       If PUBLISH_IMAGES is 'yes'
 
-python:
-- 2.7
-- 3.6
+os: linux
+services:
+- docker
 
-script:
-- pip install -e src/python
+stages:
+- name: publish latest
+  if: |
+    branch = master \
+    AND env(PUBLISH_IMAGES) = yes
+- name: publish tag
+  if: |
+    tag IS present \
+    AND env(PUBLISH_IMAGES) = yes
+- name: publish stable
+  if: |
+    tag IS present \
+    AND tag =~ ^([0-9]+\.){1,2}[0-9]+$ \
+    AND env(PUBLISH_IMAGES) = yes
+
+before_script:
+- docker login -u="$DOCKER_USERNAME" -p="$DOCKER_PASSWORD"
+
+jobs:
+  include:
+
+  # Publish-stage jobs...
+  # Every successful master build results in a latest image (above)
+  # and every tag results in a tagged image in Docker Hub.
+  # Tags that match a RegEx are considered 'official' tags
+  # and also result in a 'stable' image tag.
+
+  - stage: publish latest
+    name: Test and Latest Image
+    script:
+    # Build and push the pipelines-rdkit image and its sd-poster
+    - docker build -t informaticsmatters/rdkit_pipelines:latest -f Dockerfile-rdkit .
+    - docker push informaticsmatters/rdkit_pipelines:latest
+    - docker build -t squonk/rdkit-pipelines-sdposter:latest -f Dockerfile-sdposter .
+    - docker push squonk/rdkit-pipelines-sdposter:latest
+
+  - stage: publish tag
+    name: Tagged Image
+    script:
+    # Build and push the pipelines-rdkit image and its sd-poster
+    - docker build -t informaticsmatters/rdkit_pipelines:${TRAVIS_TAG} -f Dockerfile-rdkit .
+    - docker push informaticsmatters/rdkit_pipelines:${TRAVIS_TAG}
+    - docker build -t squonk/rdkit-pipelines-sdposter:${TRAVIS_TAG} -f Dockerfile-sdposter .
+    - docker push squonk/rdkit-pipelines-sdposter:${TRAVIS_TAG}
+
+  - stage: publish stable
+    name: Stable Image
+    script:
+    # Pull the corresponding pipelines-rdkit image tag and push it as 'stable'
+    - docker pull informaticsmatters/rdkit_pipelines:${TRAVIS_TAG}
+    - docker tag informaticsmatters/rdkit_pipelines:${TRAVIS_TAG} informaticsmatters/rdkit_pipelines:stable
+    - docker push informaticsmatters/rdkit_pipelines:stable
+    - docker pull squonk/rdkit-pipelines-sdposter:${TRAVIS_TAG}
+    - docker tag squonk/rdkit-pipelines-sdposter:${TRAVIS_TAG} squonk/rdkit-pipelines-sdposter:stable
+    - docker push squonk/rdkit-pipelines-sdposter:stable
diff --git a/README.md b/README.md
@@ -1,18 +1,21 @@
-# Piplelines.
+# Pipelines
 
-The project experiments with ways to generate data processing piplelines. 
-The aim is to generate some re-usable building blocks that can be piped 
+[![Build Status](https://travis-ci.com/InformaticsMatters/pipelines.svg?branch=master)](https://travis-ci.com/InformaticsMatters/pipelines)
+![GitHub release (latest SemVer including pre-releases)](https://img.shields.io/github/v/release/informaticsmatters/pipelines?include_prereleases)
+
+The project experiments with ways to generate data processing piplelines.
+The aim is to generate some re-usable building blocks that can be piped
 together into more functional pipelines. Their prime initial use is as executors
 for the Squonk Computational Notebook (http://squonk.it) though it is expected
 that they will have uses in other environments.
 
 As well as being executable directly they can also be executed in Docker
-containers (separately or as a single pipeline). Additionally they can be 
-executed using Nextflow (http://nextflow.io) to allow running large jobs 
+containers (separately or as a single pipeline). Additionally they can be
+executed using Nextflow (http://nextflow.io) to allow running large jobs
 on HPC-like environments.
 
-Currently it has some python scripts using RDKit (http://rdkit.org) to provide 
-basic cheminformatics and comp chem functionality, though other tools will 
+Currently it has some python scripts using RDKit (http://rdkit.org) to provide
+basic cheminformatics and comp chem functionality, though other tools will
 be coming soon, including some from the Java ecosystem.
 
 * See [here](src/python/pipelines/rdkit/README.md) for more info on the RDKit components.
@@ -31,11 +34,11 @@ In Jan 2018 some of the core functionality from this repository was broken out i
 
 ### Modularity
 
-Each component should be small but useful. Try to split complex tasks into 
+Each component should be small but useful. Try to split complex tasks into
 reusable steps. Think how the same steps could be used in other workflows.
 Allow parts of one component to be used in another component where appropriate
-but avoid over use. For example see the use of functions in rdkit/conformers.py 
-to generate conformers in o3dAlign.py 
+but avoid over use. For example see the use of functions in rdkit/conformers.py
+to generate conformers in o3dAlign.py
 
 ### Consistency
 
@@ -50,101 +53,101 @@ Generally use consistent coding styles e.g. PEP8 for Python.
 
 ## Input and output formats
 
-We aim to provide consistent input and output formats to allow results to be 
-passed between different implementations. Currently all implementations handle 
+We aim to provide consistent input and output formats to allow results to be
+passed between different implementations. Currently all implementations handle
 chemical structures so SD file would typically be used as the lowest common
-denominator interchange format, but implementations should also try to support 
+denominator interchange format, but implementations should also try to support
 Squonk's JSON based Dataset formats, which potentially allow richer representations
-and can be used to describe data other than chemical structures. 
-The utils.py module provides helper methods to handle IO. 
+and can be used to describe data other than chemical structures.
+The utils.py module provides helper methods to handle IO.
 
 ### Thin output
- 
+
 In addition implementations are encouraged to support "thin" output formats
-where this is appropriate. A "thin" representation is a minimal representation 
+where this is appropriate. A "thin" representation is a minimal representation
 containing only what is new or changed, and can significantly reduce the bandwith
-used and avoid the need for the consumer to interpret values it does not 
-need to understand. It is not always appropriate to support thin format output 
+used and avoid the need for the consumer to interpret values it does not
+need to understand. It is not always appropriate to support thin format output
 (e.g. when the structure is changed by the process).
 
-In the case of SDF thin format involves using an empty molecule for the molecule 
-block and all properties that were present in the input or were generated by the 
-process (the empty molecule is used so that the SDF syntax remains valid). 
+In the case of SDF thin format involves using an empty molecule for the molecule
+block and all properties that were present in the input or were generated by the
+process (the empty molecule is used so that the SDF syntax remains valid).
 
-In the case of Squonk JSON output the thin output would be of type BasicObject 
-(e.g. containing no structure information) and include all properties that 
-were present in the input or were generated by the process. 
+In the case of Squonk JSON output the thin output would be of type BasicObject
+(e.g. containing no structure information) and include all properties that
+were present in the input or were generated by the process.
 
-Implicit in this is that some identifier (usually a SD file property, or 
-the JSON UUID property) that is present in the input is included in the output so 
-that the full results can be "reassembled" by the consumer of the output. 
-The input would typically only contain additional information that is required 
+Implicit in this is that some identifier (usually a SD file property, or
+the JSON UUID property) that is present in the input is included in the output so
+that the full results can be "reassembled" by the consumer of the output.
+The input would typically only contain additional information that is required
 for execution of the process e.g. the structure.
 
-For consistency implementations should try to honor these command line 
+For consistency implementations should try to honor these command line
 switches relating to input and output:
 
--i and --input: For specifying the location of the single input. If not specified 
-then STDIN should be used. File names ending with .gz should be interpreted as 
-gzipped files. Input on STDIN should not be gzipped. 
+-i and --input: For specifying the location of the single input. If not specified
+then STDIN should be used. File names ending with .gz should be interpreted as
+gzipped files. Input on STDIN should not be gzipped.
 
--if and --informat: For specifying the input format where it cannot be inferred 
+-if and --informat: For specifying the input format where it cannot be inferred
 from the file name (e.g. when using STDIN). Values would be sdf or json.
 
 -o and --output: For specifying the base name of the ouputs (there could be multiple
 output files each using the same base name but with a different file extension.
-If not specified then STDOUT should be used. Output file names ending with 
-.gz should be compressed using gzip. Output on STDOUT would not be gzipped. 
+If not specified then STDOUT should be used. Output file names ending with
+.gz should be compressed using gzip. Output on STDOUT would not be gzipped.
 
--of and --outformat: For specifying the output format where it cannot be inferred 
+-of and --outformat: For specifying the output format where it cannot be inferred
 from the file name (e.g. when using STDOUT). Values would be sdf or json.
- 
---meta: Write additional metadata and metrics (mostly relevant to Squonk's 
+
+--meta: Write additional metadata and metrics (mostly relevant to Squonk's
 JSON format - see below). Default is not to write.
 
 --thin: Write output in thin format (only present where this makes sense).
 Default is not to use thin format.
 
 ### UUIDs
 
-The JSON format for input and oputput makes heavy use of UUIDs that uniquely 
-identify each structure. Generally speaking, if the structure is not changed 
-(e.g. properties are just being added to input structures) then the existing 
+The JSON format for input and oputput makes heavy use of UUIDs that uniquely
+identify each structure. Generally speaking, if the structure is not changed
+(e.g. properties are just being added to input structures) then the existing
 UUID should be retained so that UUIDs in the output match those from the input.
 However if new structures are being generated (e.g. in reaction enumeration
 or conformer generation) then new UUIDs MUST be generated as there is no longer
 a straight relationship between the input and output structures. Instead you
-probably want to store the UUID of the source structure(s) as a field(s) in 
+probably want to store the UUID of the source structure(s) as a field(s) in
 the output. To allow correlation of the outputs to the inputs (e.g. for conformer
-generation output the source molecule UUID as a field so that each conformer 
+generation output the source molecule UUID as a field so that each conformer
 identifies which source molecule it was derived from.
 
 When not using JSON format the need to handle UUIDs does not necessarily apply
-(though if there is a field named 'uuid' in the input it will be respected accordingly). 
+(though if there is a field named 'uuid' in the input it will be respected accordingly).
 To accommodate this you are recommended to ALSO specify the input molecule number
 (1 based index) as an output field independent of whether UUIDs are being handled
 as a "poor man's" approach to correlating the outputs to the inputs.
 
 ### Filtering
 
-When a service that filters molecules special attention is needed to ensure 
+When a service that filters molecules special attention is needed to ensure
 that the molecules are output in the same order as the input (obviously skipping
 structures that are filtered out). Also the service descriptor (.dsd.json) file needs special care. For
 instance take a look at the "thinDescriptors" section of src/pipelines/rdkit/screen.dsd.json
 
-When using multi-threaded execution this is especially important as results 
+When using multi-threaded execution this is especially important as results
 will usually not come back in exactly the same order as the input.
 
 ### Metrics
 
 To provide information about what happened you are strongly recommended to generate
-a metrics output file (e.g. output_metrics.txt). This file allows to provide 
+a metrics output file (e.g. output_metrics.txt). This file allows to provide
 feedback about what happened. The contents of this file are fairly simple,
 each line having a
 
 `key=value`
 
-syntax. Keys beginning and ending with __ (2 underscores) have magical meaning. 
+syntax. Keys beginning and ending with __ (2 underscores) have magical meaning.
 All other keys are treated as metrics that are recorded against that execution.
 The current magical values that are recognised are:
 
@@ -161,43 +164,43 @@ PLI=360
 
 ```
 
-It defines the input and output counts and specifies that 360 PLI 'units' 
+It defines the input and output counts and specifies that 360 PLI 'units'
 should be recorded as being consumed during execution.
 
-The purpose of the metrics is primarily to be able to chage for utilisation, but 
+The purpose of the metrics is primarily to be able to chage for utilisation, but
 even if not charging (which is often the case) then it is still good practice
 to record the utilisation.
 
 ### Metadata
 
 Squonk's JSON format requires additional metadata to allow proper handling
-of the JSON. Writing detailed metadata is optional, but recommended. If 
-not present then Squonk will use a minimal representation of metadata, but 
+of the JSON. Writing detailed metadata is optional, but recommended. If
+not present then Squonk will use a minimal representation of metadata, but
 it's recommended to provide this directly so that additional information can
-be added. 
+be added.
 
 At the very minimum Squonk needs to know the type of dataset (e.g. MoleculeObject
 or BasicObject), but this should be handled for you automatically if you use
 the utils.default_open_output* methods. Better though to also specify metadata for
-the field types when you do this. See e.g. conformers.py for an example of 
+the field types when you do this. See e.g. conformers.py for an example of
 how to do this.
 
 ## Deployment to Squonk
 
-The service descriptors need to to POSTed to the Squonk coreservices REST API. 
+The service descriptors need to to POSTed to the Squonk coreservices REST API.
 
 ### Docker
 
 A shell script can be used to deploy the pipelines to a running
 containerised Squonk deployment: -
 
     $ ./post-service-descriptors.sh
-    
+
 ### OpenShift/OKD
 
 The pipelines and service-descriptor container images are built using gradle
 in this project. The are deployed from the Squonk project using Ansible
-playbooks. 
+playbooks.
 
 >   A discussion about the deployment of pipelines can be found in the
     `Posting Squonk pipelines` section of Squonk's OpenShift Ansible
@@ -224,7 +227,7 @@ Set your `PYTHONPATH` environment variable to include the `pipelines-utils` and
 (adjusting `/path/to/` to whatever is needed):
 ```
 export PYTHONPATH=/path/to/pipelines-utils/src/python:/path/to/pipelines-utils-rdkit/src/python
-``` 
+```
 
 Run tests:
 ```
@@ -233,7 +236,7 @@ Run tests:
 
 ## Contact
 
-Any questions contact: 
+Any questions contact:
 
 Tim Dudgeon
 tdudgeon@informaticsmatters.com