Skip to content

Tracking file provenance #3711

@astro-friedel

Description

@astro-friedel

Is your feature request related to a problem? Please describe.
I would like to have the ability to track file provenance for each files used (input) or created (output) by an App. The provenance information should include:

  • File name
  • Creation date
  • File size
  • What App created it (if it doesn't already exist)
    • What arguments were given to the App
    • What environment was the App running in
  • What other Apps used the file

Describe the solution you'd like
The system would need to be able to track files (already does), and capture information about them when they are created, or used for the first time, then track their usage through the rest of the workflow. Ideally, this should require minimal changes to existing workflows. Using the existing monitoring framework is a good candidate as it already has the ability to log information to a database.

Describe alternatives you've considered
The only alternatives I have come up with are manually tracking the input and output files (not too bad for a small workflow, but very laborious, if not impossible for large workflows), and adding code in each App to monitor files created (but this would require also creating infrastructure that is alredy provided by the monitoring framework).

Additional context
It would be nice to be able to easily access the provenance information, perhaps through the parsl-visualizer.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions