Skip to content

The same image seems to be pulled in parallel causing disk exhaustion #141

@aartur

Description

@aartur

We have about 100 parameterized job definitions that use the same image config:

config {
        image   = "username/backend:some_tag"

The problem is that disk space is exhausted on Nomad clients and it looks like the reason is that the image is being pulled individually for each job, despite specifying the same exact image with the same tag. When using docker Nomad driver this didn't happen and all jobs made use of a single image that was pulled and extracted once.

I might be wrong on the explanation but this is what I get from multiple (hundreds) of error messages like:

[ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=62ab19a7-4e67-c941-cc39-340394800fa1 task=main error="rpc error: code = Unknown desc = Error in pulling image username/backend:some_tag: failed to prepare extraction snapshot \"extract-138110298-tmpn sha256:bf868a0e662ae83512efeacb6deb2e0f0f1694e693fab8f53c110cb503c00b99\": context deadline exceeded"

I.e. it looks like each allocation has it's own extraction snapshot? Is it possible to configure the driver (or containerd) so that all jobs will share a single image snapshot?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions