Skip to content

Commit 994c365

Browse files
committed
doc update
1 parent 49f2faa commit 994c365

25 files changed

+402
-12
lines changed

.gitignore

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,6 @@ gromacs_mdp.html
1818
NCI*
1919
error_*.txt
2020
validation_*.txt
21-
cwl_dirs.txt
22-
yml_dirs.txt
23-
inference_rules.txt
24-
renaming_conventions.txt
2521
.hypothesis/
2622
.env
2723
node_modules/

BPS_poster.svg

Lines changed: 1 addition & 0 deletions
Loading

README.md

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,62 @@
22

33
This repository contains workflows for various molecular modeling tasks. The workflows can be compiled & executed using the [Workflow Inference Compiler](https://github.com/PolusAI/workflow-inference-compiler).
44

5+
![BPS Poster](BPS_poster.svg)
6+
57
## Quick Start
6-
Follow the [installation instructions](https://github.com/PolusAI/workflow-inference-compiler#quick-start) here.
8+
First, follow the [installation instructions](https://github.com/PolusAI/workflow-inference-compiler#quick-start) for WIC.
9+
10+
Then, clone this repository and run the following commands:
11+
```
12+
cd install
13+
./install_biobb_adapters.sh
14+
cd ..
15+
mm-workflows --generate_schemas
16+
wic --generate_schemas
17+
```
18+
19+
Some of the workflows require an Nvidia CUDA GPU. Please see the Nvidia [installation guides](https://docs.nvidia.com/cuda/#installation-guides) for more information.
20+
(Moreover, you'll also need to configure it to work with docker. Good luck!)
21+
22+
If all goes well, then you can try running the tutorial, which is based on the following [gromacs tutorial](https://mmb.irbbarcelona.org/biobb/availability/tutorials/cwl).
23+
```
24+
wic --yaml ../mm-workflows/examples/gromacs/tutorial.wic --graphviz --run_local --quiet
25+
```
26+
27+
That last command will infer edges, compile to CWL, generate a GraphViz diagram of the root workflow, and run it locally.
28+
29+
```yaml
30+
label: Conjugate Gradient
31+
steps:
32+
- grompp:
33+
in:
34+
config: !ii
35+
mdp:
36+
integrator: cg
37+
nsteps: 1000
38+
- mdrun:
39+
in:
40+
# Use GPU by default
41+
bonded_terms: !ii cpu
42+
pme_terms: !ii cpu
43+
- gmx_energy:
44+
in:
45+
config: !ii
46+
terms: [Potential]
47+
output_xvg_path: !ii energy_min_cg.xvg
48+
```
49+
The subworkflow [`examples/gromacs/cg.wic`](https://github.com/PolusAI/mm-workflows/blob/main/examples/gromacs/cg.wic) in `mm-workflows` is shown above, and the GraphViz diagram of the root workflow [`examples/gromacs/tutorial.wic`](https://github.com/PolusAI/mm-workflows/blob/main/examples/gromacs/tutorial.wic) in `mm-workflows` is shown below.
50+
51+
![Workflow](examples/gromacs/tutorial.wic.gv.png)
52+
53+
If you add the --parallel flag to the above command then, in another terminal, you can view the plots in real-time:
54+
```
55+
conda activate wic
56+
cd install && ./install_timeseriesplots.sh && cd ..
57+
timeseriesplots
58+
```
59+
60+
![Plots](examples/gromacs/plots.png)
761

862
## Jupyter notebook visualization
963

@@ -18,3 +72,10 @@ pip install -e ".[all]"
1872
```
1973
2074
![Plots](docs/tree_viewer.png)
75+
76+
77+
## Visualizing the results
78+
79+
This particular workflow creates files which represent 3D coordinates, so we can view them in the Jupyter notebook `src/vis/viewer.ipynb`. Make sure you are using the `vis` conda environment as mentioned in the installation guide.
80+
81+
![Multistep](protein.png)

cwl_adapters/file_format_conversions/biosimspace/conversion_amb_gro_zip.cwl renamed to cwl_adapters/file_format_conversions/biosimspace/insert_steps_automatically_amb_gro_zip.cwl

File renamed without changes.

cwl_adapters/file_format_conversions/biosimspace/conversion_amb_gro_zip.wic renamed to cwl_adapters/file_format_conversions/biosimspace/insert_steps_automatically_amb_gro_zip.wic

File renamed without changes.

cwl_adapters/file_format_conversions/biosimspace/conversion_unzip_gro_amb.cwl renamed to cwl_adapters/file_format_conversions/biosimspace/insert_steps_automatically_gro_amb.cwl

File renamed without changes.

cwl_adapters/file_format_conversions/biosimspace/conversion_unzip_gro_amb.wic renamed to cwl_adapters/file_format_conversions/biosimspace/insert_steps_automatically_gro_amb.wic

File renamed without changes.

docs/advanced.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# Advanced Features
2+
3+
## Static dispatch
4+
5+
Here is an example that shows how to swap out constant pressure implementations.
6+
7+
```yaml
8+
wic:
9+
default_backend: gromacs
10+
backends:
11+
gromacs:
12+
steps:
13+
- npt_gromacs.wic:
14+
amber:
15+
steps:
16+
- npt_amber.wic:
17+
graphviz:
18+
label: Constant Pressure
19+
```
20+
21+
Then you just need to choose a specific implementation at the call site:
22+
23+
```yaml
24+
steps:
25+
- nvt.wic:
26+
- npt.wic:
27+
28+
wic:
29+
graphviz:
30+
label: Equilibration
31+
steps:
32+
(2, npt.wic):
33+
wic:
34+
backend: amber
35+
```
36+
This will override the default implementation of `gromacs` and use `amber`. This really just means that `npt_amber.wic` is called instead of `npt_gromacs.wic` (If `--insert_steps_automatically` is enabled, the compiler will attempt to automatically insert the necessary file format conversions as determined below.)
37+
38+
## Subinterpreters
39+
40+
A portion of [`examples/gromacs/nvt.wic`](https://github.com/PolusAI/mm-workflows/blob/main/examples/gromacs/nvt.wic) in `mm-workflows` is shown below. You can see that the `in:` tag of gmx_energy is identical to the `config:` tag of cwl_watcher. This currently needs to be manually copy & pasted (and indented), but it should be possible to automatically do this in the future.
41+
42+
```yaml
43+
...
44+
- mdrun:
45+
out:
46+
- output_edr_path: !& nvt.edr # Explicit edge reference / anchor
47+
# (This edge can be inferred, but made explicit for demonstration purposes.)
48+
- gmx_energy:
49+
in:
50+
input_energy_path: !* nvt.edr # Explicit edge dereference / alias
51+
config: !ii
52+
terms: [Temperature]
53+
output_xvg_path: temperature.xvg
54+
# NOTE: explicit edges are not supported with cwl_watcher, and all filenames
55+
# must be globally unique!
56+
- cwl_watcher:
57+
in:
58+
#cachedir_path: /absolute/path/to/cachedir/ (automatically filled in by wic)
59+
file_pattern: '*nvt.edr' # Any strings that start with & or * need to be escaped in quotes
60+
cwl_tool: gmx_energy # This can also be an arbitrary subworkflow!
61+
max_times: '5'
62+
config: !ii
63+
in:
64+
input_energy_path: '*nvt.edr' # This * is automatically removed.
65+
config: !ii
66+
terms: [Temperature]
67+
output_xvg_path: temperature.xvg
68+
...
69+
```
70+
71+
Note that although gmx_energy appears before cwl_watcher in the YAML file, gmx_energy is independent of cwl_watcher in the DAG and thus not considered to be a previous step. We include gmx_energy simply to guarantee that the analysis gets run one more time in the main workflow, when all the files are known to be in their final state.
72+
73+
### Known Issues
74+
75+
Since the two runtimes are not linked, there is not currently a reliable way to determine if the previous steps have finished. Thus, to guarantee termination of the second runtime, we simply execute `cwl_tool` upto `max_times`. We also waive any guarantees about the files, so the subworkflow in the second runtime may of course fail for any number of reasons. Thus, we do not propagate speculative failures up to the main workflow.
76+
77+
The runtime system intentionally hides the working sub-directories of each step. Thus, we are forced to use a file watcher (hence the name cwl_watcher) recursively starting from `cachedir_path`. This is why all filenames used with cwl_watcher must be globally unique. (Actually, for technical reasons we cannot use a file watching library; we simply use a good old fashioned polling loop.)
78+
79+
## Real-time plots
80+
81+
It is assumed that the real-time analysis takes care of the complex log file parsing, etc and produces simple tabular data files (i.e. csv files separated by whitespace instead of a comma). We need to use the same file watching / polling trick as above to locate these tabular data files. The first argument to the following command is the directory in which to look for the files. (By default it is `cachedir` because that is the default value of the `--cachedir` wic command line argument.) You can also optionally supply the file patterns, which by default are `*.xvg` and `*.dat`.
82+
83+
```
84+
timeseriesplots cachedir <pat1> <pat2> <...>
85+
```
86+
87+
## YAML Metadata Annotations
88+
89+
### Overloading / Parameter Passing
90+
91+
This example shows how we can recursively pass in parameters / recursively overload metadata.
92+
93+
Suppose we want to do a very careful minimization, first in vacuum and then in solvent (i.e. [`examples/gromacs/setup_vac_min.wic`](https://github.com/PolusAI/mm-workflows/blob/main/examples/gromacs/setup_vac_min.wic) in `mm-workflows`). We would like to re-use the abstract minimization protocol from `min.wic`. However, our stability analysis requires an explicit edge definition from the final minimized coordinates (i.e. in solvent). If we try to simply add `- output_tpr_path: !& min.tpr` directly to `min.wic`, there will be duplicate definitions! This is not allowed (it will generate an exception).
94+
95+
The solution is to pass in this parameter to only the second instance of `min.wic`.
96+
97+
A portion of [`examples/gromacs/basic.wic`](https://github.com/PolusAI/mm-workflows/blob/main/examples/gromacs/basic.wic) is shown below.
98+
99+
```yaml
100+
...
101+
# Put everything under one top-level wic: tag to facilitate easy merging and removal.
102+
wic:
103+
graphviz:
104+
label: Molecular Dynamics
105+
steps:
106+
(1, min.wic):
107+
wic:
108+
steps:
109+
(2, cg.wic):
110+
wic:
111+
steps:
112+
(1, grompp):
113+
out:
114+
- output_tpr_path: !& min.tpr
115+
...
116+
```

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Molecular Modeling Workflows documentation
99
installguide.md
1010
tutorials/tutorials.rst
1111
userguide.md
12+
advanced.md
1213
dev/api.rst
1314

1415
Indices and tables

docs/overview.md

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,21 @@
1-
# Overview
1+
# Main Features / Design Overview
22

3-
TODO
3+
* Subworkflows
4+
5+
Subworkflows are used extensively to create reusable building blocks.
6+
7+
For example, a basic molecular dynamics workflow is composed of minimization, equilibration, and production steps, the equilibration step is composed of constant volume (NVT) and constant pressure (NPT) steps, and each of those are composed of primitive backend-specific steps. If we then want to do a stability analysis, we should be able to incorporate the molecular dynamics workflow as a black box, and we should only have to append the stability analysis subworkflow. See [here](https://github.com/PolusAI/mm-workflows/blob/main/examples/gromacs/tutorial.wic)!
8+
9+
* Multiple backends
10+
11+
There are often several backend engines that implement the same algorithm (e.g. amber / gromacs / namd). In principle, each backend ought to be exactly interchangeable, but in practice backends may randomly crash, etc. For this reason, we want the ability to arbitrarily switch backends at any step. Moreover, different users may be familiar with different backends, but we still want to compose together their workflows.
12+
13+
For example, we should be able to compose system setup using amber/tleap, equilibration using namd, and metadynamics using gromacs. File format conversions should be automatically inserted between steps. This is not possible with other 'backend-independent' software packages which require a single backend to be fixed at the beginning and used throughout.
14+
15+
See [static dispatch](advanced.md#static-dispatch)
16+
17+
* Automated Real-time Analysis & Plots
18+
19+
For quality control purposes, it is highly desirable to have fully automated analyses. This is particularly important for Virtual Screening, where the number of simulations is so large that a user cannot possibly manually inspect each simulation and intervene. For example, a stability analysis ought to be done before an expensive binding free energy calculation is performed.
20+
21+
We support iteratively running an arbitrary analysis workflow in real-time (i.e. while the simulation is still running) and plotting the results. Timeseries data can be automatically segmented and clustered into stastically separate probability distributions, which are beautifully reflected in the associated histograms.

0 commit comments

Comments
 (0)