Skip to content

Architecture Discussion: Source-Local First Plugin Execution #48

@rudolphpienaar

Description

@rudolphpienaar

This issue tracks the architecture/governance side of the source-local first plugin execution model discussed in pfcon:

Context

The proposed pattern is to preserve the existing CUBE -> pfcon contract ("here is the input, run this plugin") while broadening what counts as valid input locality. Instead of assuming the first plugin must run only on data already ingested into CUBE storage, pfcon can resolve/access source-local data where permitted and run the first plugin directly there.

Feed-scoped import is one important consequence of this design, but it is not the only purpose and should not be the primary framing. If the first plugin is a copy/import plugin, this becomes a feed-scoped ingestion path. If the first plugin is an analysis plugin, this becomes compute-near-data with results registered back into ChRIS.

Why in CHRIS_docs

This touches cross-repo architecture decisions:

  • source-local first plugin execution semantics
  • feed-scoped import as one workflow built on that model
  • federation/data-source abstraction expectations
  • provenance requirements for materialized execution directives
  • policy boundaries for permitted source types and credentials

Relationship to pfcon high-fanout work

This should be viewed as complementary to FNNDSC/pfcon#155, not as a competing design direction.

  • pfcon#155 asks how pfcon should reliably get data it needs for large fanout jobs.
  • this discussion asks what counts as valid input locality for the first plugin, and how that intent should be represented.

Control-plane principle

The execution intent should be materialized as data, not hidden in an env var or out-of-band runtime switch.

That suggests a source descriptor / directive file model where:

  • the input source is explicit
  • the access mode is explicit
  • the first plugin intent is explicit

This keeps the control plane aligned with a Data-State DAG style model and makes provenance/audit capture tractable.

Proposed outcome

Document a reference architecture and operating constraints for:

  1. Async staging / source resolution flow
  2. Source adapter model (e.g. cube://, posix://, s3://)
  3. Cache behavior expectations
  4. Directive/materialized manifest contract
  5. Provenance/audit metadata expectations

Please use the pfcon issue for implementation detail and this issue for architecture-level alignment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions