Skip to content

Comments

[auth] Introduce token renewals system#714

Open
Fricounet wants to merge 11 commits intocontainerd:mainfrom
DataDog:fricounet/token-renewals
Open

[auth] Introduce token renewals system#714
Fricounet wants to merge 11 commits intocontainerd:mainfrom
DataDog:fricounet/token-renewals

Conversation

@Fricounet
Copy link
Contributor

@Fricounet Fricounet commented Feb 19, 2026

Overview

Please briefly describe the changes your pull request makes.

Introduce a background credential renewal subsystem for nydus-snapshotter. When enabled, a goroutine periodically reconciles the set of active RAFS instances against an in-memory credential store, renewing credentials for images currently in use.

Renewal is scoped to providers that can re-fetch credentials autonomously: Docker config, kubelet credential provider plugins, and Kubernetes docker config secrets. CRI-based and label-based credentials are excluded. CRI caches its state globally and would shadow renewable providers at renewal time; labels are only present during pull.

The feature is disabled by default. No behavior change unless credential_renewal_interval is set.

This change is not yet wired to push refreshed credentials into nydusd. That is the next step, pending dragonflyoss/nydus#1864.

Related Issues

Please link to the relevant issue. For example: Fix #123 or Related #456.

This is the next implementation step for #690.
Next steps are gonna be about wiring this renewal store to update nydusd creds as well once dragonflyoss/nydus#1864 is merged.

Change Details

Please describe your changes in detail:

Credential store and reconciliation loop (pkg/auth/renewal.go):
On each tick the goroutine:

  • reads the live RAFS instance list to determine which image refs are currently in use
  • renews credentials for refs present in both RAFS and the store
  • adds and renews credentials for refs present in RAFS but not yet in the store (covers snapshotter restart)
  • evicts store entries absent from RAFS once their renewedAt age exceeds interval/2

There is a grace period of interval/2 grace period to a renewal tick from evicting entries that were just added during an in-progress pull (the RAFS entry is created after mount completes, not at first credential fetch).

Provider separation (pkg/auth/keychain.go):

  • buildProviders (full priority chain: labels → CRI → docker → kubelet → kubesecret) is used for regular credential lookups.
  • renewableProviders (docker → kubelet → kubesecret) is used exclusively by the renewal goroutine,

2 prometheus metrics were added to follow the state of the renewal store:

  • snapshotter_credential_renewals_total{image_ref, result}: counter of renewal attempts by outcome
  • snapshotter_credential_store_entries{image_ref}: gauge of tracked entries

Test Results

If you have any relevant screenshots or videos that can help illustrate your changes, please add them here.

I've tested the feature in a live AWS environment with kubelet creds providers configured using the datadog-agent image:

  • when the container is started, the logs show the agent image being added to the store:
INFO[2026-02-19T17:40:57.931529934+01:00] Prepare active Nydus snapshot k8s.io/320/60b3a2a5d181a6b6e8fba14390becea95695d3a4f219ea4fac23d28f79caed40  key=k8s.io/320/60b3a2a5d181a6b6e8fba14390becea95695d3a4f219ea4fac23d28f79caed40 parent="k8s.io/319/sha256:74ade07e9f4a897ccbb95a7b9687741bbaa7187857fa352af2b3f0edd2d32028"
DEBU[2026-02-19T17:40:57.931537789+01:00] Prepare remote snapshot 59                    key=k8s.io/320/60b3a2a5d181a6b6e8fba14390becea95695d3a4f219ea4fac23d28f79caed40 parent="k8s.io/319/sha256:74ade07e9f4a897ccbb95a7b9687741bbaa7187857fa352af2b3f0edd2d32028"
INFO[2026-02-19T17:40:57.936071063+01:00] Trying to get credentials from labels         ref="1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus"
INFO[2026-02-19T17:40:57.936299487+01:00] Trying to get credentials from cri            ref="1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus"
INFO[2026-02-19T17:40:57.936314856+01:00] Trying to get credentials from docker         ref="1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus"
INFO[2026-02-19T17:40:57.936336219+01:00] Trying to get credentials from kubelet        ref="1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus"
DEBU[2026-02-19T17:41:01.544125064+01:00] Total credentials: 1, Matching registries: 1
DEBU[2026-02-19T17:41:01.544139755+01:00] Selected registry after sorting: 1111111111.dkr.ecr.us-east-1.amazonaws.com
DEBU[2026-02-19T17:41:01.544144390+01:00] adding credential entry to store              provider=kubelet ref="1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus"
  • after a few minutes, we can see the image is recorded in the metrics. And it's been renewed:
$ curl -s http://localhost:9110/v1/metrics | grep credential
# HELP snapshotter_credential_renewals_total Total number of credential renewal attempts, labeled by image ref and result (success or failure).
# TYPE snapshotter_credential_renewals_total counter
snapshotter_credential_renewals_total{image_ref="1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus",result="success"} 5
# HELP snapshotter_credential_store_entries Number of credentials currently tracked in the renewal store per image ref.
# TYPE snapshotter_credential_store_entries gauge
snapshotter_credential_store_entries{image_ref="1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus"} 1
  • to test the seeding logic, I simply restarted the snapshotter:
INFO[2026-02-19T17:46:54.117524885+01:00] Logger successfully set up. Proceeding to process nydus-snapshotter configurations
INFO[2026-02-19T17:46:54.117562587+01:00] Start nydus-snapshotter. Version: v0.15.11-13-gd985907.m, PID: 3371915, FsDriver: fusedev, DaemonMode: multiple
INFO[2026-02-19T17:46:54.122059163+01:00] Run daemons monitor...
INFO[2026-02-19T17:46:54.122719712+01:00] Started metrics HTTP server on ":9110"
DEBU[2026-02-19T17:46:54.122814215+01:00] found daemon states &daemon.ConfigState{ID:"d6bjq69jjoq6fi4i4mg0", ProcessID:3370110, APISocket:"/containerd-local/io.containerd.snapshotter.v1.nydus/socket/d6bjq69jjoq6fi4i4mg0/api.sock", DaemonMode:"dedicated", FsDriver:"fusedev", LogDir:"/containerd-local/io.containerd.snapshotter.v1.nydus/logs/d6bjq69jjoq6fi4i4mg0", LogLevel:"debug", LogRotationSize:100, LogToStdout:true, Mountpoint:"/containerd-local/io.containerd.snapshotter.v1.nydus/snapshots/59/mnt", SupervisorPath:"", ThreadNum:0, FailoverPolicy:"resend", ConfigDir:"/containerd-local/io.containerd.snapshotter.v1.nydus/config/d6bjq69jjoq6fi4i4mg0"}
INFO[2026-02-19T17:46:54.122825827+01:00] Recovering daemon ID d6bjq69jjoq6fi4i4mg0
WARN[2026-02-19T17:47:09.149689624+01:00] Daemon d6bjq69jjoq6fi4i4mg0 died somehow. Clean up its vestige!, get daemon state: daemon socket /containerd-local/io.containerd.snapshotter.v1.nydus/socket/d6bjq69jjoq6fi4i4mg0/api.sock: not found
DEBU[2026-02-19T17:47:09.149752484+01:00] found RAFS instance &rafs.Rafs{Seq:0x5, ImageID:"1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus", DaemonID:"d6bjq69jjoq6fi4i4mg0", FsDriver:"fusedev", SnapshotID:"59", SnapshotDir:"/containerd-local/io.containerd.snapshotter.v1.nydus/snapshots/59", Mountpoint:"/containerd-local/io.containerd.snapshotter.v1.nydus/snapshots/59/mnt", Annotations:map[string]string{}}
INFO[2026-02-19T17:47:09.150081786+01:00] Unmounting /containerd-local/io.containerd.snapshotter.v1.nydus/snapshots/59/mnt when clear vestige
WARN[2026-02-19T17:47:09.150111624+01:00] Can't umount /containerd-local/io.containerd.snapshotter.v1.nydus/snapshots/59/mnt, not mounted
WARN[2026-02-19T17:47:09.150123246+01:00] Can't delete residual unix socket /containerd-local/io.containerd.snapshotter.v1.nydus/socket/d6bjq69jjoq6fi4i4mg0/api.sock, remove /containerd-local/io.containerd.snapshotter.v1.nydus/socket/d6bjq69jjoq6fi4i4mg0/api.sock: no such file or directory
INFO[2026-02-19T17:47:09.150142105+01:00] nydusd command: /containerd-local/nydusd fuse --config /containerd-local/io.containerd.snapshotter.v1.nydus/config/d6bjq69jjoq6fi4i4mg0/config.json --bootstrap /containerd-local/io.containerd.snapshotter.v1.nydus/snapshots/59/fs/image/image.boot --mountpoint /containerd-local/io.containerd.snapshotter.v1.nydus/snapshots/59/mnt --apisock /containerd-local/io.containerd.snapshotter.v1.nydus/socket/d6bjq69jjoq6fi4i4mg0/api.sock --log-level debug --log-rotation-size 100 --failover-policy resend
[2026-02-19 17:47:09.152319 +01:00] INFO Program Version: v2.3.8-dd.1, Git Commit: "6a8a7312b2cba29adcc02a14f751061600eb447a", Build Time: "2026-02-18T09:43:39.019939167Z", Profile: "debug", Rustc Version: "rustc 1.84.0 (9fc6b4312 2025-01-07)"
[2026-02-19 17:47:09.152375 +01:00] INFO Set rlimit-nofile to 1000000, maximum 1000000
[2026-02-19 17:47:09.152742 +01:00] DEBUG [/fuse-backend-rs-0.13.1/src/api/pseudo_fs.rs:161] pseudo fs iterate "/"
[2026-02-19 17:47:09.153043 +01:00] INFO RAFS features: HASH_BLAKE3 | EXPLICIT_UID_GID | HAS_XATTR | COMPRESSION_ZSTD | INLINED_CHUNK_DIGEST | ENCRYPTION_NONE
[2026-02-19 17:47:09.153243 +01:00] INFO backend config: ConnectionConfig { proxy: ProxyConfig { url: "", ping_url: "", fallback: false, check_interval: 5, use_http: false, check_pause_elapsed: 300 }, skip_verify: false, timeout: 15, connect_timeout: 15, retry_limit: 5 }
[2026-02-19 17:47:09.159944 +01:00] INFO Refresh token thread started.
[2026-02-19 17:47:09.160412 +01:00] INFO RAFS filesystem imported
[2026-02-19 17:47:09.160475 +01:00] INFO Rafs filesystem mounted at /
[2026-02-19 17:47:09.160617 +01:00] INFO mount source rafs dest /containerd-local/io.containerd.snapshotter.v1.nydus/snapshots/59/mnt with fstype fuse opts default_permissions,fd=5,rootmode=40000,user_id=0,group_id=0,max_read=1052672,allow_other fd 5
[2026-02-19 17:47:09.160878 +01:00] INFO State machine(pid=3371946): from Init to Ready, input [Mount], output [None]
[2026-02-19 17:47:09.161009 +01:00] INFO State machine(pid=3371946): from Ready to Running, input [Start], output [Some(StartService)]
[2026-02-19 17:47:09.161031 +01:00] INFO start fuse servers with 4 worker threads
[2026-02-19 17:47:09.161335 +01:00] INFO FUSE INIT major 7 minor 39
 in_opts: ASYNC_READ | POSIX_LOCKS | ATOMIC_O_TRUNC | EXPORT_SUPPORT | BIG_WRITES | DONT_MASK| SPLICE_WRITE | SPLICE_MOVE | SPLICE_READ | FLOCK_LOCKS | HAS_IOCTL_DIR | AUTO_INVAL_DATA | DO_READDIRPLUS | READDIRPLUS_AUTO | ASYNC_DIO | WRITEBACK_CACHE | ZERO_MESSAGE_OPEN | PARALLEL_DIROPS | HANDLE_KILLPRIV | POSIX_ACL | ABORT_ERROR | MAX_PAGES | CACHE_SYMLINKS | ZERO_MESSAGE_OPENDIR | EXPLICIT_INVAL_DATA | HANDLE_KILLPRIV_V2 | INIT_EXT | PERFILE_DAX
out_opts: ASYNC_READ | BIG_WRITES | HAS_IOCTL_DIR | AUTO_INVAL_DATA | DO_READDIRPLUS | READDIRPLUS_AUTO | ASYNC_DIO | WRITEBACK_CACHE | ZERO_MESSAGE_OPEN | PARALLEL_DIROPS | MAX_PAGES | CACHE_SYMLINKS | ZERO_MESSAGE_OPENDIR | EXPLICIT_INVAL_DATA | PERFILE_DAX
[2026-02-19 17:47:09.161524 +01:00] INFO Fuse daemon started!
[2026-02-19 17:47:09.161636 +01:00] INFO HTTP API server running at /containerd-local/io.containerd.snapshotter.v1.nydus/socket/d6bjq69jjoq6fi4i4mg0/api.sock
[2026-02-19 17:47:09.161710 +01:00] INFO http server started
INFO[2026-02-19T17:47:09.325207077+01:00] Subscribe daemon d6bjq69jjoq6fi4i4mg0 liveness event, path=/containerd-local/io.containerd.snapshotter.v1.nydus/socket/d6bjq69jjoq6fi4i4mg0/api.sock.
[2026-02-19 17:47:09.352211 +01:00] DEBUG [/src/http_handler.rs:182] <--- Get Uri { string: "/api/v1/daemon" }
[2026-02-19 17:47:09.352455 +01:00] DEBUG [/src/http_handler.rs:187] ---> Get Status Code: OK, Elapse: Ok(259.163µs), Body Size: 1575
[2026-02-19 17:47:09.352472 +01:00] DEBUG [/src/http_handler.rs:182] <--- Get Uri { string: "/api/v1/daemon" }
[2026-02-19 17:47:09.352519 +01:00] DEBUG [/src/http_handler.rs:187] ---> Get Status Code: OK, Elapse: Ok(47.013µs), Body Size: 1575
INFO[2026-02-19T17:47:09.352724119+01:00] Started system controller on "/containerd-local/sock/nydus-system.sock"
INFO[2026-02-19T17:47:09.352809748+01:00] Start system controller API server on /containerd-local/sock/nydus-system.sock
INFO[2026-02-19T17:47:09.353030835+01:00] registered kubelet credential provider plugin  name=ecr-test
INFO[2026-02-19T17:47:09.353043508+01:00] kubelet credential provider initialized
INFO[2026-02-19T17:47:09.353202712+01:00] Trying to get credentials from labels         ref="1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus"
INFO[2026-02-19T17:47:09.353225264+01:00] Trying to get credentials from cri            ref="1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus"
INFO[2026-02-19T17:47:09.353242931+01:00] Trying to get credentials from docker         ref="1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus"
INFO[2026-02-19T17:47:09.353295097+01:00] Trying to get credentials from kubelet        ref="1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus"
DEBU[2026-02-19T17:47:12.904749335+01:00] Total credentials: 1, Matching registries: 1
DEBU[2026-02-19T17:47:12.904768609+01:00] Selected registry after sorting: 1111111111.dkr.ecr.us-east-1.amazonaws.com
DEBU[2026-02-19T17:47:12.904776946+01:00] adding credential entry to store              provider=kubelet ref="1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus"
INFO[2026-02-19T17:47:12.904812873+01:00] credential renewal initialized                interval=1m0s seeded=1
DEBU[2026-02-19T17:48:16.319497068+01:00] Total credentials: 1, Matching registries: 1
DEBU[2026-02-19T17:48:16.319515005+01:00] Selected registry after sorting: 1111111111.dkr.ecr.us-east-1.amazonaws.com
DEBU[2026-02-19T17:48:16.319521016+01:00] updating credential entry in store            provider=kubelet ref="1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus"
  • and the metric does report the successful seeding:
$ curl -s http://localhost:9110/v1/metrics | grep credential
# HELP snapshotter_credential_store_entries Number of credentials currently tracked in the renewal store per image ref.
# TYPE snapshotter_credential_store_entries gauge
snapshotter_credential_store_entries{image_ref="1111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-nydus"} 1

Change Type

Please select the type of change your pull request relates to:

  • Bug Fix
  • Feature Addition
  • Documentation Update
  • Code Refactoring
  • Performance Improvement
  • Other (please describe)

Self-Checklist

Before submitting a pull request, please ensure you have completed the following:

  • I have run a code style check and addressed any warnings/errors.
  • I have added appropriate comments to my code (if applicable).
  • I have updated the documentation (if applicable).
  • I have written appropriate unit tests.

Add a RenewableProvider interface with a CanRenew() bool method so that
callers can determine at runtime whether a provider's credentials can be
refreshed without user interaction. Implement it on DockerProvider,
KubeletProvider and KubeSecretProvider (all return true); CRI and Labels
providers are excluded.

Add String() to all providers for structured log fields.

Refactor GetRegistryKeyChain to iterate over a provider list built by a
replaceable buildProviders var, enabling unit-test injection without
global state manipulation.
Add CredentialRenewals (counter, by image_ref + result) and
CredentialStoreEntries (TTL gauge, by image_ref) to track renewal
health and store occupancy. Register both in the global Prometheus
registry.
Add credentialStore: an in-memory map of image ref -> PassKeyChain with
per-entry TTL-based expiration, protected by a sync.RWMutex.

Add InitCredentialRenewal, which creates the global store, seeds it
from a list of existing refs (used on snapshotter restart), and starts
a background goroutine that renews all live entries at a configurable
interval.

GetRegistryKeyChain now serves from the store on a cache hit and writes
back credentials obtained from a RenewableProvider when the store is
active.
Add CredentialRenewalInterval (time.Duration, default 0 = disabled) to
AuthConfig. When set to a positive value the snapshotter activates the
credential renewal subsystem.
Call auth.InitCredentialRenewal after service recovery when
CredentialRenewalInterval > 0. Seed the store from unique ImageIDs
present in the recovered RAFS cache so credentials for running
containers are renewed immediately after a snapshotter restart.

Call auth.RemoveCredentials in Filesystem.Umount so the renewal
goroutine stops refreshing tokens for images that are no longer
mounted, preventing unbounded store growth.
Move the authentication section out of configure_nydus.md into a new
docs/registry_authentication.md. Add documentation for the new
credential_renewal_interval option and update the cross-reference in
configure_nydus.md.
@Fricounet Fricounet force-pushed the fricounet/token-renewals branch from 6b58922 to 6e1889d Compare February 19, 2026 18:01
Signed-off-by: Baptiste Girard-Carrabin <baptiste.girardcarrabin@datadoghq.com>
@Fricounet Fricounet force-pushed the fricounet/token-renewals branch from 6e1889d to 50e895f Compare February 19, 2026 18:01
@Fricounet Fricounet marked this pull request as ready for review February 19, 2026 18:02
@codecov
Copy link

codecov bot commented Feb 20, 2026

Codecov Report

❌ Patch coverage is 70.08547% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 22.57%. Comparing base (fc330cc) to head (9917271).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
pkg/auth/renewal.go 84.05% 8 Missing and 3 partials ⚠️
pkg/auth/keychain.go 67.85% 9 Missing ⚠️
pkg/auth/kubelet.go 50.00% 3 Missing ⚠️
cmd/containerd-nydus-grpc/snapshotter.go 0.00% 2 Missing ⚠️
pkg/auth/cri.go 0.00% 2 Missing ⚠️
pkg/auth/docker.go 33.33% 2 Missing ⚠️
pkg/auth/kubesecret.go 33.33% 2 Missing ⚠️
pkg/auth/labels.go 0.00% 2 Missing ⚠️
pkg/metrics/registry/registry.go 0.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #714      +/-   ##
==========================================
+ Coverage   22.02%   22.57%   +0.55%     
==========================================
  Files         130      131       +1     
  Lines       11931    12016      +85     
==========================================
+ Hits         2628     2713      +85     
+ Misses       8960     8957       -3     
- Partials      343      346       +3     
Files with missing lines Coverage Δ
config/config.go 38.33% <ø> (ø)
pkg/auth/provider.go 45.45% <ø> (ø)
cmd/containerd-nydus-grpc/snapshotter.go 0.00% <0.00%> (ø)
pkg/auth/cri.go 77.55% <0.00%> (-3.31%) ⬇️
pkg/auth/docker.go 73.91% <33.33%> (-6.09%) ⬇️
pkg/auth/kubesecret.go 35.59% <33.33%> (-0.06%) ⬇️
pkg/auth/labels.go 64.70% <0.00%> (-8.63%) ⬇️
pkg/metrics/registry/registry.go 0.00% <0.00%> (ø)
pkg/auth/kubelet.go 81.46% <50.00%> (-0.83%) ⬇️
pkg/auth/keychain.go 53.22% <67.85%> (+38.94%) ⬆️
... and 1 more

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Fricounet Fricounet marked this pull request as draft February 20, 2026 10:43
@Fricounet
Copy link
Contributor Author

Putting this back in draft because I'm considering a slight architectural change for this:

  • the current choice resolve around several steps that need to work together:
    1. When snapshotter starts, seed renewal store with existing rafs
    2. When image is pulled, if creds are returned by a renewable provider, add ref to store
    3. At regular interval, renew images in store
    4. When RAFS is torn down, remove ref from store

However, this approach leads to a few edge cases:

  • due to the creds providers order, the non-renewable providers (CRI, label) are called first which means that if they have creds for a ref, the creds won't ever be renewed even if a renewable provider could have handled it
  • not sure if that's really possible, but if there are multiple RAFS which use the same image. When the first one is umounted, then the creds are removed from the store and not renewed any more

The alternative would be that on each renewal run, we get the up to date list of rafs and derive the images to renew from there. This ensure we renew all the creds currently in use. We still keep an internal list of refs in the renewal store. But now:

  • if image in rafs and missing in store -> add to store and start renewing
  • if image in rafs and in store -> nothing, keep renewing
  • if image not in rafs and not in store -> nothing
  • if image not in rafs and in store -> remove from store and stop renewing

My main point of concern is how to know which provider to call GetCredentials on in that case:

  • if we keep adding the creds in keyring.go, we risk having a race where keyring adds the ref and the renewal runs before RAFS entry is created, leading to creds being deleted: is that even an issue though? Is it even possible for the rate to happen?
  • if we don't add the creds in keyring.go, how can renewCredentials know how to call entry.provider.GetCredentials?

Add renewableProviders alongside buildProviders. The renewal goroutine
must not use CRI (caches credentials globally after the pull) or Labels
(only available at pull time via snapshot labels). Restricting renewal
to Docker, Kubelet, and KubeSecret ensures the correct providers are
reached at renewal time regardless of what served credentials at pull
time. Refactor buildProviders to compose renewableProviders, eliminating
duplication.

Extract fetchFromProviders from getRegistryKeyChainFromProviders so the
renewal goroutine can call providers directly without going through the
store cache check. getRegistryKeyChainFromProviders checks the store
first, then delegates to fetchFromProviders; the renewal path calls
fetchFromProviders directly to obtain fresh credentials.
Replace the store-driven eviction model (explicit RemoveCredentials on
umount) with a reconciliation loop that derives the active set from the
live RAFS instance list on every tick. This fixes two edge cases:

- Multiple RAFS instances sharing the same image ref: the first umount
  no longer evicts the entry while other instances are still running.
- Non-renewable providers (CRI, Labels) blocking renewable ones at
  renewal time: the renewal goroutine now uses renewableProviders()
  exclusively, so CRI's cached global state never interferes.

On each tick, reconcile compares store entries against the live RAFS
set. Entries in RAFS are renewed; entries absent from RAFS are evicted
once their renewedAt age exceeds interval/2. The interval/2 grace period
prevents eviction of entries added during an in-progress pull (the RAFS
entry is created after the mount completes, so a tick arriving mid-pull
would otherwise evict a valid entry).

renewEntry is now a plain function that delegates to fetchFromProviders.
The store write on success happens inside fetchFromProviders, so
renewEntry only needs to record metrics.

InitCredentialRenewal drops the existingRefs parameter; the initial
reconcile pass seeds the store from the live RAFS cache directly.
Remove auth.RemoveCredentials from Filesystem.Umount; eviction is now
handled by the reconciliation loop in the renewal goroutine. Remove
the auth import from fs.go and the rafs import from snapshotter.go
which are no longer needed.

Simplify InitCredentialRenewal call site: no existingRefs slice to
build, just pass the interval.

Update the credential renewal section in docs/registry_authentication.md
to reflect that renewal re-queries providers in priority order each
tick rather than re-using the provider that originally issued the
credentials.
@Fricounet
Copy link
Contributor Author

I've made the changes mentioned there. I think this approach is more robust.

PR is ready for reivew

@Fricounet Fricounet marked this pull request as ready for review February 20, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant