Skip to content

remote: lost update when updating OCI referrers fallback tag under concurrent writers #2205

@1seal

Description

@1seal

hi maintainers,

this looks like a correctness/integrity bug in the OCI referrers fallback tag update path (used when the registry does not support the OCI 1.1 /referrers/ endpoint).

summary

when pushing OCI artifacts that include a subject to registries without the referrers endpoint, go-containerregistry updates the shared referrers fallback tag (distribution-spec “referrers tag schema”) via a GET → modify → PUT sequence without conditional requests or a retry/merge loop. under concurrent writers this can lead to a lost update (“last write wins”), and the fallback index can end up missing one of the referrers even though both uploads succeed.

this requires concurrent writers with legitimate push/attach permissions to the same repository (so it’s not “no-auth remote attacker”), but it can break supply-chain workflows where multiple jobs attach signatures/attestations/SBOMs to the same subject digest in parallel.

affected code (as-of)

  • tested commit: 795787c558e1bee15319df39784c557c0d224681
  • callsite: pkg/v1/remote/write.go (*writer).commitSubjectReferrers (contains // TODO: use conditional requests to avoid race conditions)

why it happens

commitSubjectReferrers does:

  1. GET /v2/<repo>/manifests/<fallback-tag> (OCI index) or assumes empty on 404
  2. append descriptor
  3. PUT /v2/<repo>/manifests/<fallback-tag>

if two writers do this concurrently, both can read the same base index and then overwrite each other. the result is an index that contains only one of the concurrent referrer descriptors.

impact

  • referrers listing becomes unreliable under concurrency
  • downstream remote.Referrers(subjectDigest) (fallback mode) can miss expected artifacts, causing intermittent verification/policy failures or inconsistent behavior

repro (local, probabilistic)

this is a small repro against registry:2 (no /referrers/ support). it’s probabilistic; it usually triggers within a few hundred iterations on my machine.

# terminal 1
docker rm -f reg 2>/dev/null || true
docker run -d --name reg -p 5000:5000 registry:2

# terminal 2
tmpdir=$(mktemp -d)
cd "$tmpdir"
cat > main.go <<'GO'
package main

import (
  "context"
  "fmt"
  "sync"

  "github.com/google/go-containerregistry/pkg/name"
  v1 "github.com/google/go-containerregistry/pkg/v1"
  "github.com/google/go-containerregistry/pkg/v1/empty"
  "github.com/google/go-containerregistry/pkg/v1/mutate"
  "github.com/google/go-containerregistry/pkg/v1/remote"
)

func mustRef(s string) name.Reference {
  r, err := name.ParseReference(s)
  if err != nil {
    panic(err)
  }
  return r
}

func mustDigest(repo string, d v1.Hash) name.Digest {
  dg, err := name.NewDigest(fmt.Sprintf("%s@%s", repo, d.String()))
  if err != nil {
    panic(err)
  }
  return dg
}

func main() {
  ctx := context.Background()
  repo := "localhost:5000/test"

  const iters = 2000
  for i := 0; i < iters; i++ {
    // unique subject per iteration
    subj := mutate.Annotations(empty.Image, map[string]string{"iter": fmt.Sprintf("%d", i)}).(v1.Image)
    subjRef := mustRef(fmt.Sprintf("%s/subj:%d", repo, i))
    if err := remote.Write(subjRef, subj); err != nil {
      panic(err)
    }

    subjDesc, err := remote.Head(subjRef)
    if err != nil {
      panic(err)
    }
    subjDigest := mustDigest(fmt.Sprintf("%s/subj", repo), subjDesc.Digest)

    // two distinct referrers for the same subject
    r1 := mutate.Subject(
      mutate.Annotations(empty.Image, map[string]string{"iter": fmt.Sprintf("%d", i), "w": "1"}).(v1.Image),
      *subjDesc,
    ).(v1.Image)
    r2 := mutate.Subject(
      mutate.Annotations(empty.Image, map[string]string{"iter": fmt.Sprintf("%d", i), "w": "2"}).(v1.Image),
      *subjDesc,
    ).(v1.Image)

    r1Ref := mustRef(fmt.Sprintf("%s/r1:%d", repo, i))
    r2Ref := mustRef(fmt.Sprintf("%s/r2:%d", repo, i))

    var wg sync.WaitGroup
    wg.Add(2)
    go func() { defer wg.Done(); _ = remote.Write(r1Ref, r1) }()
    go func() { defer wg.Done(); _ = remote.Write(r2Ref, r2) }()
    wg.Wait()

    idx, err := remote.Referrers(subjDigest)
    if err != nil {
      panic(err)
    }
    im, err := idx.IndexManifest()
    if err != nil {
      panic(err)
    }

    if len(im.Manifests) != 2 {
      fmt.Printf("lost update at iter=%d: expected 2 referrers, got %d\n", i, len(im.Manifests))
      return
    }
  }

  fmt.Printf("no lost update observed in %d iterations (try increasing iters)\n", iters)
}
GO

go mod init example.com/gcr-race
# pin to the tested commit
go get github.com/google/go-containerregistry@795787c558e1bee15319df39784c557c0d224681

go run .

suggested fix

implement optimistic concurrency for fallback tag updates:

  • on GET, capture a version identifier (e.g. Docker-Content-Digest and/or ETag when present)
  • on PUT, send If-Match so concurrent updates fail with a precondition error
  • on precondition failure, refetch, merge, retry (bounded)

if conditional headers are not consistently honored by all registries, a weaker but still helpful approach is to refetch after PUT and verify the inserted descriptor is present; if not present, refetch+merge+retry.

happy to help with a focused test case once you confirm the preferred place/style for it (unit test vs integration test).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions