docs: More doc and typo corrections

Alan Christie · Alan Christie · commit a3f76ed03003 · 2025-09-24T07:25:43.000Z
diff --git a/workflow/decoder.py b/workflow/decoder.py
@@ -5,7 +5,7 @@
 The _main_ purpose of this module is to provide a 'validate_schema()' function
 to check that a workflow definition (a dictionary) that is expected to comply with
 the 'workflow-schema,yaml' schema. This function returns a string (an error) if there's
-a problem with the defintion.
+a problem with the definition.
 
 The decoder module also provides a number of additional functions based on the needs
 of the engine. As a developer you are 'encouraged' to place any logic that is expected
diff --git a/workflow/workflow_abc.py b/workflow/workflow_abc.py
@@ -4,51 +4,55 @@
 to the engine.
 
 Before go any further it is important to understand that a Workflow 'Step' is realised
-by the execution of a Data Manager 'Job'. A 'Step' is simnply the definition of
-a Job's execution withion the context of a 'Workflow'. We also talk about 'Instances'.
+by the execution of a Data Manager 'Job'. A 'Step' is simply the definition of
+a Job's execution within the context of a 'Workflow'. We also talk about 'Instances'.
 Instances are a Data Manger concept. They are an object (and database Table)
-represening the running state of a Job.
+representing the running state of a Job.
 
 When steps 'Steps' are run the are represented by 'Jobs' that run as an 'Instance'.
 
 To this end the workflow engine relies on a two broad external services, encapsulated
 by abstract class definitions we define here: -
 
-- An 'Instance Laucncher' to facilitate the execution of Jobs
-- An API 'wrapper' providing access to an underling database that stores
-  Workflows, RunningWorkflows, RunningWorkflowSteps, and Instances.
+-   An 'Instance Launcher' to facilitate the execution of Jobs and dataclass objects
+    encapsulating launch parameters and results
+-   An API 'wrapper' providing access to an underling database that stores
+    Workflows, RunningWorkflows, RunningWorkflowSteps, and Instances. API responses
+    are Python dictionaries, emulating the payload of a REST response body.
+    The engine _could_ use the DM REST API but we provide an _internal_ service
+    to avoid issues with authentication and user tokens needed by the REST API.
 
 Module philosophy
 -----------------
-The engine is responsible for orchestrating Step exection (executing Jobs) but does not
+The engine is responsible for orchestrating Step execution (executing Jobs) but does not
 contain the logic that is able to run them. This is because a) job execution
 (in Kubernetes) is a complex affair and b) the Data Manager already provides this
-logic. Instead the engine defines an ABC for an 'InstanceLaucnher' and the
+logic. Instead the engine defines an ABC for an 'InstanceLauncher' and the
 DM provides the implementation. The engine simply has to create a 'LaunchParameter'
-object describign the Job to be laucnhed (including variables etc.) and then
+object describing the Job to be launched (including variables etc.) and then
 relies on the Instance Launcher to effect the execution.
 
 The engine also does not consist of any persistence capability and instead relies on the
 Data Manager's database to host suitable 'Workflow', 'RunningWorkflow',
 and 'RunningWorkflowStep' tables. The 'WorkflowAPIAdapter' defined here provides an
 interface that a concrete implementation uses to allow access to and modification
-of records withing these tables.
+of records within these tables.
 
 The engine does not create or remove records directly, they are created either by the
-Data Manager via its API or the Instance laucnher when as it starts Jobs (Steps).
+Data Manager via its API or the Instance launcher when as it starts Jobs (Steps).
 The DM API creates a Workflow record when the user creates a Workflow.
-It also creates RunnignWorkflow records (while also validating them) when the
+It also creates RunningWorkflow records (while also validating them) when the
 user 'runs' a workflow. It also creates RunningWorkflowStep records to track the
-execution state of each step when the Instance lancher is called upon
+execution state of each step when the Instance launcher is called upon
 to start a Step.
 
 The instance launcher is controlled by a complex set of 'parameters' (a
-'LaunchPrameters' dataclass object) that comprehensively descibe the Job -
-it's variables, and inputs and outputs. The instance launcher provies just one method:
-'launch()'. It takes a paramters object, and in return the yields a 'LaunchResult'
+'LaunchParameters' dataclass object) that comprehensively describe the Job -
+it's variables, and inputs and outputs. The instance launcher provides just one method:
+'launch()'. It takes a parameters object, and in return the yields a 'LaunchResult'
 dataclass object that contains the record IDs of the instance created, and the
 corresponding RunningWorkflowStep. The result also describes any launch error.
-If there is a laucnh error the tep can assume to have not started. if there is
+If there is a launch error the step can assume to have not started. if there is
 no error the step will (probably) start.
 """
 
diff --git a/workflow/workflow_engine.py b/workflow/workflow_engine.py
@@ -1,29 +1,71 @@
 """The WorkflowEngine execution logic.
 
-Module philosophy
------------------
-The module implements the workflow execution logic, which is driven by
-Pod and Workflow protocol buffer messages received by its 'handle_message()' function.
-Messages are delivered by the message handler in the PBC Pod.
-There are no other publci methods in this class - it's _entry point_ is
-'handle_message()'.
-
-Its role is to translate a pre-validated workflow definition into the ordered execution
-of step "Jobs" that manifest as Pod "Instances" that run in a project directory in the
+This module realises workflow definitions, turning a definition into a controlled sequence
+of Job executions. The Data Manager is responsible for storing and validating Workflows,
+and this module is responsible for running them and reporting their state back to the
 DM.
 
-Workflow messages initiate (START) and terminate (STOP) workflows. Pod messages signal
-the end of individual workflow steps and carry the exit code of the executed Job.
-The engine used START messages to launch the first "step" in a workflow and the Pod
-messages to signal the success (or failure) of a prior step. A step's success is used,
-along with it's original workflow definition to determine the next action
-(run the next step or signal the end of the workflow).
-
-Before a START message is transmitted the author (typically the Workflow Validator)
-will have created a RunningWorkflow record in the DM. The ID of this record is passed
-in the START message that is sent. The engine uses this ID to find the running workflow
-and the workflow. The engine creates RunningWorkflowStep records for each step that
-is executed, and it uses thew InstanceLauncher to launch the Job (a Pod) for each step.
+The engine is event-driven, responding to two types of message
+(in the form of Protocol Buffers) - Workflow messages and a Pod messages.
+These messages, sent from the DM Protocol Buffer Consumer (PBC), are delivered to the
+engine via its 'handle_message()' method. The engine must react to these messages
+appropriately by: -
+
+-   Starting the execution of a new Workflow
+    (when it receives a Workflow 'START' message)
+-   Stopping the execution of an exiting Workflow
+    (when it receives a Workflow 'STOP' message)
+-   Progressing an exiting running workflow to its next Step
+    (when it receives a Pod message)
+
+When running a workflow, once the engine determines the action (the Step to run)
+its most complex logic lies in the preparation of a set variables for the Step (Job).
+This logic is confined to '_prepare_step()', which returns a 'StepPreparationResponse'
+dataclass object. This object is used by the second key method in this module,
+'_launch()'. The launch methods used the prepared variables and launches (using
+a DM-provided 'InstanceLauncher' implementation) one or more Instances of a Step Job,
+providing each with an appropriate set of command variables.
+
+Module philosophy
+-----------------
+The module's role is to translate a pre-validated workflow definition into the ordered
+execution of Step "Jobs" that manifest as Pod "Instances" running in a project directory
+under the control of the DM.
+
+Workflow messages are used to initiate (START) and terminate (STOP) workflows.
+Pod messages signal the end of a previously launched step and carry the exit code
+of the executed Job.
+
+The engine uses START messages to launch the first "step" in a workflow, while Pod
+messages signal the success (or failure) of a prior step. A step's success is used,
+along with it's original workflow definition to determine the next action - either
+the execution of a new step or the conclusion of the Workflow.
+
+The engine does has no persistence and not create database records. Instead it relies
+on an API 'wrapper' to retrieve records and alter them.
+
+Objects that provide API and InstanceLauncher implementations are made available
+to the engine when the DM creates it. passing them through the class initialiser.
+
+The engine is designed not to retain any state persistence, it reacts to messages,
+reconstructing its state based on Workflow, RunningWorkflow, and RunningWorkflowStep
+records maintained by the DM. There's no real 'pattern' here - it's simply complex
+custom sequential logic that is executed from the context of 'handle_message()'
+that has to translate a workflow definition into running Job Instances.
+
+If there is a pattern its closest approximation is probably a State pattern, closely
+related to a Finite State Machine with the function 'handle_message()' used to alter
+the engine's 'state'. The engine is in fact a complex running workflow 'state machine',
+hence the term 'Engine' (another term for machine) used in its class name.
+
+Only one instance of the engine is created by the DM so it also essentially exists as a
+Singleton.
+
+There are no sub-classes or other modules. Today all the state logic is captured
+in this single module. There is no need to introduce level of redirection that simply
+reduce the size of the file. There is a level of complexity that cannot be avoided -
+the need to understand how to move a workflow forward and how to prepare a set of
+variables for the next 'Step'.
 """
 
 import logging
diff --git a/workflow/workflow_validator.py b/workflow/workflow_validator.py
@@ -8,22 +8,22 @@
 
     CREATE level validation simply checks that the workflow complies with the schema.
     Workflows are permitted in the DM that do not comply with the schema. This is
-    becuase the DM is also used as a persistent store for Wwrfklows while editing - this
+    because the DM is also used as a persistent store for Workflows while editing - this
     allows a user to 'save' a workflow that is incomplete with the intention of
     adjusting it at a later date prior to execution.
 
     TAG level validation takes things a little further. In 'production' mode
-    tagging is required prior to exeution. TAG level validatioin ensures that a workflow
-    _should_ run if it is run - for examplke variable names are all correctly defined
+    tagging is required prior to execution. TAG level validation ensures that a workflow
+    _should_ run if it is run - for example variable names are all correctly defined
     and there are no duplicates.
 
     RUN level extends TAG level validation by ensuring, for example, all the
     workflow variables are defined.
 
 Validation is designed to allow a more relaxed engine implementation, negating the
 need for the engine to 'check', for example, that variables exist - the validator
-ensures they do so that the engine can concentrate on laucnhing steps rather than
-implementing swatchs of lines of logic to protect against mal-use.
+ensures they do so that the engine can concentrate on launching steps rather than
+implementing swathes of lines of logic to protect against improper use.
 
 It is the Data Manager that is responsible for invoking the validator. It does this
 prior to allowing a user to run a workflow. When the engine receives a 'Workflow Start'