Skip to content

Commit 2d2d30e

Browse files
author
Alan Christie
committed
docs: Significant level of module documentation
1 parent 53b858b commit 2d2d30e

File tree

5 files changed

+114
-21
lines changed

5 files changed

+114
-21
lines changed

tests/message_dispatcher.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
"""The UnitTest Message Dispatcher.
22
3-
A very simple object that relies on an underlying message queue.
3+
A very simple object that relies on an underlying message queue and is designed
4+
to emulate the behaviour of the message queue used in the Data Manager.
45
Here we offer a minimal implementation that simply sends a (protocol buffer) message
56
to the queue.
67
"""
78

89
from google.protobuf.message import Message
910

1011
from tests.message_queue import UnitTestMessageQueue
11-
from workflow.workflow_abc import MessageDispatcher
1212

1313

14-
class UnitTestMessageDispatcher(MessageDispatcher):
14+
class UnitTestMessageDispatcher:
1515
"""A minimal Message dispatcher to support testing."""
1616

1717
def __init__(self, msg_queue: UnitTestMessageQueue):

workflow/decoder.py

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,19 @@
1-
"""A module to validate and decode workflow definitions.
2-
3-
This is typically used by the Data Manager's Workflow Engine.
1+
"""A module to validate and decode workflow (and step) definitions.
2+
3+
Module philosophy
4+
-----------------
5+
The _main_ purpose of this module is to provide a 'validate_schema()' function
6+
to check that a workflow definition (a dictionary) that is expected to comply with
7+
the 'workflow-schema,yaml' schema. This function returns a string (an error) if there's
8+
a problem with the defintion.
9+
10+
The decoder module also provides a number of additional functions based on the needs
11+
of the engine. As a developer you are 'encouraged' to place any logic that is expected
12+
to navigate the scheme content in a function in this module. Any code that
13+
is supposed to know _where_ to get content should be encoded as a function here.
14+
For example, rather than external code navigating the 'plumbing' blocks
15+
we have a function 'get_workflow_variable_names()' that returns the names of workflow
16+
variables.
417
"""
518

619
import os

workflow/workflow_abc.py

Lines changed: 51 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,61 @@
11
"""Workflow abstract base classes.
2-
Interface definitions of class instances that must be made available to the Engine.
2+
3+
A module that provides interface definitions for classes that must be made available
4+
to the engine.
5+
6+
Before go any further it is important to understand that a Workflow 'Step' is realised
7+
by the execution of a Data Manager 'Job'. A 'Step' is simnply the definition of
8+
a Job's execution withion the context of a 'Workflow'. We also talk about 'Instances'.
9+
Instances are a Data Manger concept. They are an object (and database Table)
10+
represening the running state of a Job.
11+
12+
When steps 'Steps' are run the are represented by 'Jobs' that run as an 'Instance'.
13+
14+
To this end the workflow engine relies on a two broad external services, encapsulated
15+
by abstract class definitions we define here: -
16+
17+
- An 'Instance Laucncher' to facilitate the execution of Jobs
18+
- An API 'wrapper' providing access to an underling database that stores
19+
Workflows, RunningWorkflows, RunningWorkflowSteps, and Instances.
20+
21+
Module philosophy
22+
-----------------
23+
The engine is responsible for orchestrating Step exection (executing Jobs) but does not
24+
contain the logic that is able to run them. This is because a) job execution
25+
(in Kubernetes) is a complex affair and b) the Data Manager already provides this
26+
logic. Instead the engine defines an ABC for an 'InstanceLaucnher' and the
27+
DM provides the implementation. The engine simply has to create a 'LaunchParameter'
28+
object describign the Job to be laucnhed (including variables etc.) and then
29+
relies on the Instance Launcher to effect the execution.
30+
31+
The engine also does not consist of any persistence capability and instead relies on the
32+
Data Manager's database to host suitable 'Workflow', 'RunningWorkflow',
33+
and 'RunningWorkflowStep' tables. The 'WorkflowAPIAdapter' defined here provides an
34+
interface that a concrete implementation uses to allow access to and modification
35+
of records withing these tables.
36+
37+
The engine does not create or remove records directly, they are created either by the
38+
Data Manager via its API or the Instance laucnher when as it starts Jobs (Steps).
39+
The DM API creates a Workflow record when the user creates a Workflow.
40+
It also creates RunnignWorkflow records (while also validating them) when the
41+
user 'runs' a workflow. It also creates RunningWorkflowStep records to track the
42+
execution state of each step when the Instance lancher is called upon
43+
to start a Step.
44+
45+
The instance launcher is controlled by a complex set of 'parameters' (a
46+
'LaunchPrameters' dataclass object) that comprehensively descibe the Job -
47+
it's variables, and inputs and outputs. The instance launcher provies just one method:
48+
'launch()'. It takes a paramters object, and in return the yields a 'LaunchResult'
49+
dataclass object that contains the record IDs of the instance created, and the
50+
corresponding RunningWorkflowStep. The result also describes any launch error.
51+
If there is a laucnh error the tep can assume to have not started. if there is
52+
no error the step will (probably) start.
353
"""
454

555
from abc import ABC, abstractmethod
656
from dataclasses import dataclass
757
from typing import Any
858

9-
from google.protobuf.message import Message
10-
1159

1260
@dataclass
1361
class LaunchParameters:
@@ -358,11 +406,3 @@ def get_running_workflow_step_output_values_for_output(
358406
# {
359407
# "output": ["dir/file1.sdf", "dir/file2.sdf"]
360408
# }
361-
362-
363-
class MessageDispatcher(ABC):
364-
"""The class handling the sending of messages (on the Data Manager message bus)."""
365-
366-
@abstractmethod
367-
def send(self, message: Message) -> None:
368-
"""Send a message"""

workflow/workflow_engine.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,12 @@
11
"""The WorkflowEngine execution logic.
22
3-
It responds to Pod and Workflow protocol buffer messages received by its
4-
'handle_message()' function, messages delivered by the message handler in the PBC Pod.
5-
There are no other methods in this class.
3+
Module philosophy
4+
-----------------
5+
The module implements the workflow execution logic, which is driven by
6+
Pod and Workflow protocol buffer messages received by its 'handle_message()' function.
7+
Messages are delivered by the message handler in the PBC Pod.
8+
There are no other publci methods in this class - it's _entry point_ is
9+
'handle_message()'.
610
711
Its role is to translate a pre-validated workflow definition into the ordered execution
812
of step "Jobs" that manifest as Pod "Instances" that run in a project directory in the

workflow/workflow_validator.py

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,40 @@
1-
"""The WorkflowEngine validation logic."""
1+
"""The WorkflowEngine validation logic.
2+
3+
A module that provides workflow validation at levels beyond the schema. A workflow
4+
definition is a complex structure, and not all of its content can be checked using
5+
a JSON/YAML schema alone. This module provides various 'levels' of workflow
6+
inspection of the workflow in increasing levels of validation: -
7+
8+
- CREATE
9+
- RUN
10+
- TAG
11+
12+
CREATE level validation simply checks that the workflow complies with the schema.
13+
Workflows are permitted in the DM that do not comply with the schema. This is becuase
14+
the DM is also used as a persistent store for Wwrfklows while editing - this
15+
allows a user to 'save' a workflow that is incomplete with the intention of adjusting
16+
it at a later date prior to execution.
17+
18+
TAG level validation takes things a little further. In 'production' mode
19+
tagging is required prior to exeution. TAG level validatioin ensures that a workflow
20+
_should_ run if it is run - for examplke variable names are all correctly defined
21+
and there are no duplicates.
22+
23+
RUN level extends TAG level validation by ensuring, for example, all the
24+
workflow variables are defined.
25+
26+
Validation is designed to allow a more relaxed engine implementation, negating the
27+
need for the engine to 'check', for example, that variables exist - the validator
28+
ensures they do.
29+
30+
Module philosophy
31+
-----------------
32+
Here we define the 'ValidationLevel' enumeration and 'ValidationResult' dataclass
33+
used as a return object by the validation function. The module defines the
34+
'WorkflowValidator' class with one (class-level) function ... 'validate()'.
35+
The 'validate()' function ensures that the checks based on the validation level
36+
are executed and any breach is returned via a 'ValidationResult' instance.
37+
"""
238

339
from dataclasses import dataclass
440
from enum import Enum

0 commit comments

Comments
 (0)