Skip to content

Better to have multiple smaller Docker images or one larger Docker image? #138

@niemasd

Description

@niemasd

I am creating a workflow that has many steps that use different tools, and I'm creating minimal Docker images for each of the tools. I was wondering which of the following approaches would be better for Reflow's scalability/performance/etc.:

  • Have each step of the pipeline use a different small Docker image that contains only the tools used for that specific step
  • Have all steps of the pipeline use a single larger Docker image that contains all of the tools used in the entire pipeline

Any guidance would be greatly appreciated (and ideally any information as to why exactly one may be better than the other, so I can get a better understanding of how Reflow works)

EDIT: And note that, for a single sample, each individual step of the workflow is actually quite fast (e.g. minutes of runtime). The scalability issue we're facing is that we have thousands of samples being run in parallel (so we have a lot of small pipeline step executions)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions