Derived Datastream Orchestration

The Automated Job Orchestration functional spec gives the following requirements for derived datastream orchestration (derived meaning multiple source datastreams combined into one target/“virtual” datastream):

- [ ] must support tasks that extract multiple source datastreams, compute a virtual datastream from an arithmetic combination, and load results into HydroServer.
- [ ] Virtual/derived calculations are a separate scheduled ETL task (not mixed into raw scraping in the same step).
- [ ] Source datastream data must already exist in HydroServer before virtual calculation runs (after scraping and aggregation).
- [ ] Each virtual destination datastream must define:
   1. A list of source datastreams.
   2. An equation/expression describing how sources are combined.

- [ ] For each timestep, calculation requires values from all input source datastreams; if not all are present, insert a “no data” value.
- [ ] One or more ETL tasks can be defined for virtual calculations; each task can include one or more virtual destination datastreams.
- [ ] Virtual calculations can run in parallel per destination datastream because each destination is independent.
- [ ] Assumption: all source datastreams share the same timestamps and aggregation interval; destination inherits these.
- [ ] WON'T IMPLEMENT: Per current DWRi behavior, the system recalculates the prior 28 days and overwrites existing calculated values. (I think the system will be easier to manage if the assumption for derived datastreams is their source datastreams are fully quality controlled)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Derived Datastream Orchestration #368

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Derived Datastream Orchestration #368

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions