Skip to content

Commit bf60ee4

Browse files
authored
Refactoring docs and internals to improve readability (PolusAI#339)
1 parent ff91b31 commit bf60ee4

File tree

5 files changed

+66
-52
lines changed

5 files changed

+66
-52
lines changed

docs/userguide.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ See [overview](overview.md)
1010

1111
Many software packages have a way of automatically discovering files which they can use. (examples: [pytest](https://docs.pytest.org/en/latest/explanation/goodpractices.html#conventions-for-python-test-discovery) [pylint](https://pylint.pycqa.org/en/latest/user_guide/usage/run.html))
1212

13-
By default, wic will recursively search for tools / workflows within the directories (and subdirectories) listed in the config file's json tags `search_paths_cwl` and `search_paths_wic`. The paths listed can be absolute or relative. The default `config.json` is shown.
13+
By default, sophios will recursively search for tools / workflows within the directories (and subdirectories) listed in the config file's json tags `search_paths_cwl` and `search_paths_wic`. The paths listed can be absolute or relative. The default `config.json` is shown.
1414

1515
***`We strongly recommend placing all repositories of tools / workflows in the same parent directory.`***
1616

@@ -39,7 +39,7 @@ By default, wic will recursively search for tools / workflows within the directo
3939
.....
4040
```
4141

42-
If you do not specify config file using the command line argument `--config`, it will be automatically created for you the first time you run wic in `~/wic/global_config.json`. (Because of this, the first time you run wic you should be in the root directory of any one of your repos.) Then you can manually edit this file with additional sources of tools / workflows.
42+
If you do not specify config file using the command line argument `--config`, it will be automatically created for you the first time you run sophios in `~/wic/global_config.json`. (Because of this, the first time you run sophios you should be in the root directory of any one of your repos.) Then you can manually edit this file with additional sources of tools / workflows.
4343

4444
To avoid dealing with relative file paths in YAML files, by default
4545

@@ -74,7 +74,7 @@ steps:
7474
message: !ii Hello World
7575
```
7676

77-
Note that this is one key difference between WIC and CWL. In CWL, all inputs must be given in a separate file. In WIC, inputs can be given inline with !ii and after compilation they will be automatically extracted into the separate file.
77+
Note that this is one key difference between sophios and CWL. In CWL, all inputs must be given in a separate file. In sophios, inputs can be given inline with !ii and after compilation they will be automatically extracted into the separate file.
7878

7979
(NOTE: raw CWL is still supported with the --allow_raw_cwl flag.)
8080

@@ -288,7 +288,7 @@ Similar to `scatter`, `when` is a **special (and optional)** attribute to any st
288288
The `when` attribute of a step object exposes the exact same js embedded syntax of `when` tag of the YAML/CWL syntax. One has to be careful about appropriate escaping in the string input of `when` in Python API. In the above case the comparison is between two strings so "" is around the literal 27 (i.e. value after `toString` step).
289289
## Partial Failures
290290

291-
In running workflows at scale, sometimes it is the case that one of the workflow steps may crash due to a bug causing the entire workflow to crash. In this case can use `--partial_failure_enable` flag. For special cases when the exit status of a workflow step isn't 1, and a different error code is returned (for example 142), then the user can supply the error code to wic as a success code to prevent workflow from crashing with `--partial_failure_success_codes 0 1 142`. By default partial failure flag will consider only 0 and 1 as success codes. An example line snippet of the error code being printed is shown below.
291+
In running workflows at scale, sometimes it is the case that one of the workflow steps may crash due to a bug causing the entire workflow to crash. In this case can use `--partial_failure_enable` flag. For special cases when the exit status of a workflow step isn't 1, and a different error code is returned (for example 142), then the user can supply the error code to sophios as a success code to prevent workflow from crashing with `--partial_failure_success_codes 0 1 142`. By default partial failure flag will consider only 0 and 1 as success codes. An example line snippet of the error code being printed is shown below.
292292
```
293293
[1;30mWARNING[0m [33m[job compare_extract_protein_pdbbind__step__4__topology_check] exited with status: 139[0m
294294
```

docs/validation.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
# Formal Schemas
44

5-
What is validation? Validation is the process of checking whether the contents of a file agree with a formal specification. A popular modern choice for formal specification language is jsonschema, which is what wic uses. (Don't worry, you do not need to learn jsonschema. See below.)
5+
What is validation? Validation is the process of checking whether the contents of a file agree with a formal specification. A popular modern choice for formal specification language is jsonschema, which is what sophios uses. (Don't worry, you do not need to learn jsonschema. See below.)
66

77
## Different levels of strictness
88
Notice that I said `a` formal specification, not `the` formal specification. That's because there are many different kinds of schemas, which may be more or less strict.
@@ -19,33 +19,33 @@ Next, there are various basic schemas for simply checking that a yml file is syn
1919

2020
Since CWL files are just special yml files, there are [validators](https://github.com/common-workflow-language/cwl-utils/blob/main/cwl_utils/parser/cwl_v1_0.py) for these special yml files.
2121

22-
Getting better, but wic workflows aren't just any yml or cwl files, they are even more special. So how can we create a special schema that will only accept wic workflows?
22+
Getting better, but sophios workflows aren't just any yml or cwl files, they are even more special. So how can we create a special schema that will only accept sophios workflows?
2323

24-
## The WIC schema
24+
## The sophios schema
2525

2626
At the opposite end of the spectrum (from `{}`), we could create the most strict schema possible. This would be a schema which only accepts inputs from ***`all known tools and subworkflows`***. In other words, we can use a ***`whitelist`*** instead of a blacklist. We can assume a closed world instead of an open world. Where does this whitelist come from? It comes from the [auto-discovery](userguide.md#auto-discovery) mechanism!
2727

28-
So the command `wic --generate_schemas`
28+
So the command `sophios --generate_schemas`
2929
* makes a list of every single tool and workflow available
3030
* generates a separate sub-schema for each tool / workflow individually (this is why the flag says schemas, plural)
3131
* combines all of the sub-schemas into a single giant disjoint union!
3232

33-
In other words, each step in a WIC workflow had better be chosen from a list of valid steps!
33+
In other words, each step in a sophios workflow had better be chosen from a list of valid steps!
3434

3535
### Stale schemas ###
3636

37-
A direct consequence of using the most strict possible schema is that, as you add and/or modify tools and workflows, you have to keep re-generating the WIC schema so it doesn't become stale. Otherwise you will get validation errors, because you will be attempting to validate against an old schema which no longer reflects the tool and workflows that are currently available.
37+
A direct consequence of using the most strict possible schema is that, as you add and/or modify tools and workflows, you have to keep re-generating the sophios schema so it doesn't become stale. Otherwise you will get validation errors, because you will be attempting to validate against an old schema which no longer reflects the tool and workflows that are currently available.
3838

3939
The exact same thing happens in Python when you are in the middle of editing a file. Until you push the save button, VSCode will keep attempting to validate your python code against the functions and methods that were available the last time you saved. It isn't a surprise that you get a red squiggly line while you are still typing. Makes sense, right?
4040

41-
That said, for technical and performance reasons, (for now) we do not automatically generate a new wic schema.
41+
That said, for technical and performance reasons, (for now) we do not automatically generate a new sophios schema.
4242

4343
## TL;DR ##
4444

4545
You must periodically run this command!
4646

4747
```
48-
rm -rf autogenerated/schemas/ && wic --generate_schemas
48+
rm -rf autogenerated/schemas/ && sophios --generate_schemas
4949
```
5050

5151
If you are getting validation errors, try re-running this command!
@@ -54,15 +54,15 @@ If you are getting validation errors, try re-running this command!
5454

5555
So why are we going through all this trouble to create the most strict possible schema?
5656

57-
The answer is that thanks to the excellent hypothesis-jsonschema library, we can use the WIC schema to perform property-based integration testing
57+
The answer is that thanks to the excellent hypothesis-jsonschema library, we can use the sophios schema to perform property-based integration testing
5858

59-
***`on the entire WIC language, for every single tool and workflow simultaneously!`***
59+
***`on the entire sophios language, for every single tool and workflow simultaneously!`***
6060

6161
We can randomly generate ***`entire synthetic workflows!`***
6262

63-
This has [already been implemented](https://github.com/PolusAI/workflow-inference-compiler/blob/master/tests/test_fuzzy_compile.py), and it has been incredibly useful for finding a few (and thankfully only a few!) bugs. It also revealed a few subtle design issues (which have long ago been fixed).
63+
This has [already been implemented](https://github.com/PolusAI/sophios/blob/master/tests/test_fuzzy_compile.py), and it has been incredibly useful for finding a few (and thankfully only a few!) bugs. It also revealed a few subtle design issues (which have long ago been fixed).
6464

65-
The [Fuzzy Compile CI logs](https://github.com/PolusAI/workflow-inference-compiler/actions/workflows/fuzzy_compile_weekly.wic) should speak for themselves.
65+
The [Fuzzy Compile CI logs](https://github.com/PolusAI/sophios/actions/workflows/fuzzy_compile_weekly.wic) should speak for themselves.
6666

6767
## Well what about ... !?!
6868

src/sophios/post_compile.py

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -18,17 +18,22 @@ def find_output_dirs(data: Union[RoseTree, Dict, list]) -> list:
1818
list: A list of location values.
1919
"""
2020
results = []
21-
if isinstance(data, Dict):
22-
if "class" in data and data["class"] == "Directory" and "location" in data:
23-
if isinstance(data["location"], dict) and "wic_inline_input" in data["location"]:
24-
results.append(data["location"]["wic_inline_input"])
25-
else:
26-
results.append(data["location"])
27-
for value in data.values():
28-
results.extend(find_output_dirs(value))
29-
elif isinstance(data, list):
30-
for item in data:
31-
results.extend(find_output_dirs(item))
21+
match data:
22+
case dict() as data_dict:
23+
match data_dict:
24+
case {"class": "Directory", "location": {"wic_inline_input": val}, **rest_data_dict}:
25+
results.append(val)
26+
case {"class": "Directory", "location": dl, **rest_data_dict}:
27+
results.append(dl)
28+
case _:
29+
pass
30+
for value in data_dict.values():
31+
results.extend(find_output_dirs(value))
32+
case list(l):
33+
for item in l:
34+
results.extend(find_output_dirs(item))
35+
case _:
36+
pass
3237

3338
return results
3439

@@ -79,6 +84,8 @@ def remove_entrypoints(container_engine: str, rose_tree: RoseTree) -> RoseTree:
7984
# Requires root, so guard behind CLI option
8085
if container_engine == 'docker':
8186
plugins.remove_entrypoints_docker()
82-
if container_engine == 'podman':
87+
elif container_engine == 'podman':
8388
plugins.remove_entrypoints_podman()
89+
else:
90+
pass
8491
return plugins.dockerPull_append_noentrypoint_rosetree(rose_tree)

src/sophios/run_local.py

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
import shutil
1010
import platform
1111
import traceback
12-
from typing import Dict, List, Optional
12+
from typing import List, Optional
1313
from datetime import datetime
1414

1515
try:
@@ -413,20 +413,23 @@ def stage_input_files(yml_inputs: Yaml, root_yml_dir_abs: Path,
413413
FileNotFoundError: If throw and it any of the input files do not exist.
414414
"""
415415
for key, val in yml_inputs.items():
416-
if isinstance(val, Dict) and val.get('class', '') == 'File':
417-
path = root_yml_dir_abs / Path(val['path'])
418-
if not path.exists() and throw:
419-
# raise FileNotFoundError(f'Error! {path} does not exist!')
420-
print(f'Error! {path} does not exist!')
421-
sys.exit(1)
422-
423-
relpath = Path('autogenerated/') if relative_run_path else Path('.')
424-
pathauto = relpath / Path(val['path']) # .name # NOTE: Use .name ?
425-
pathauto.parent.mkdir(parents=True, exist_ok=True)
426-
427-
if path != pathauto:
428-
cmd = ['cp', str(path), str(pathauto)]
429-
proc = sub.run(cmd, check=False)
416+
match val:
417+
case {'class': 'File', **rest_of_val}:
418+
path = root_yml_dir_abs / Path(val['path'])
419+
if not path.exists() and throw:
420+
# raise FileNotFoundError(f'Error! {path} does not exist!')
421+
print(f'Error! {path} does not exist!')
422+
sys.exit(1)
423+
424+
relpath = Path('autogenerated/') if relative_run_path else Path('.')
425+
pathauto = relpath / Path(val['path']) # .name # NOTE: Use .name ?
426+
pathauto.parent.mkdir(parents=True, exist_ok=True)
427+
428+
if path != pathauto:
429+
cmd = ['cp', str(path), str(pathauto)]
430+
_ = sub.run(cmd, check=False)
431+
case _:
432+
pass
430433

431434

432435
def cwltool_main() -> int:

src/sophios/utils_cwl.py

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -229,15 +229,19 @@ def canonicalize_type(type_obj: Any) -> Any:
229229
Returns:
230230
Any: The JSON canonical normal form associated with type_obj
231231
"""
232-
if isinstance(type_obj, str):
233-
if len(type_obj) >= 1 and type_obj[-1:] == '?':
234-
return ['null', canonicalize_type(type_obj[:-1])]
235-
if len(type_obj) >= 2 and type_obj[-2:] == '[]':
236-
return {'type': 'array', 'items': canonicalize_type(type_obj[:-2])}
237-
if isinstance(type_obj, Dict):
238-
if type_obj.get('type') == 'array':
239-
return {**type_obj, 'items': canonicalize_type(type_obj['items'])}
240-
return type_obj
232+
match type_obj:
233+
case str() as str_obj:
234+
if len(str_obj) >= 1 and str_obj[-1:] == '?':
235+
return ['null', canonicalize_type(str_obj[:-1])]
236+
if len(str_obj) >= 2 and str_obj[-2:] == '[]':
237+
return {'type': 'array', 'items': canonicalize_type(str_obj[:-2])}
238+
return str_obj
239+
case dict() as dict_obj:
240+
if dict_obj.get('type') == 'array':
241+
return {**dict_obj, 'items': canonicalize_type(dict_obj['items'])}
242+
return dict_obj
243+
case _:
244+
return type_obj
241245

242246

243247
def canonicalize_steps_list(steps: Yaml) -> List[Yaml]:

0 commit comments

Comments
 (0)