Skip to content
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ type Config struct {
InitRetryDelay time.Duration
BucketID string
DigestCacheTTL time.Duration
Enabled bool
}

func calculateRolloutBucket(apiKey string) string {
Expand All @@ -55,6 +56,7 @@ func NewConfig(cfg config.Component, rcClient RemoteConfigClient) Config {
MaxInitRetries: 5,
InitRetryDelay: 1 * time.Second,
BucketID: calculateRolloutBucket(cfg.GetString("api_key")),
DigestCacheTTL: 1 * time.Hour, // DEV: Make this configurable
DigestCacheTTL: cfg.GetDuration("admission_controller.auto_instrumentation.gradual_rollout.cache_ttl_hours") * time.Hour,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to lock in to hours here? Is there a case where a user might want a finer grained window?

Before these changes, DigestCacheTTL was just of type time.Duration; with these changes, it will always be on the order of hours.

If it's unlikely a user will care, leave as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good call out. I just specified this as hours so it was human-readable when you configured it 🤔 I figured setting a TTL by minutes wouldn't be a very common use case...?

Enabled: cfg.GetBool("admission_controller.auto_instrumentation.gradual_rollout.enabled"),
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ func TestNewConfig(t *testing.T) {
InitRetryDelay: 1 * time.Second,
BucketID: "2",
DigestCacheTTL: 1 * time.Hour,
Enabled: true,
},
},
{
Expand All @@ -57,6 +58,7 @@ func TestNewConfig(t *testing.T) {
InitRetryDelay: 1 * time.Second,
BucketID: "2",
DigestCacheTTL: 1 * time.Hour,
Enabled: true,
},
},
{
Expand All @@ -74,6 +76,7 @@ func TestNewConfig(t *testing.T) {
InitRetryDelay: 1 * time.Second,
BucketID: "2",
DigestCacheTTL: 1 * time.Hour,
Enabled: true,
},
},
{
Expand All @@ -92,6 +95,47 @@ func TestNewConfig(t *testing.T) {
InitRetryDelay: 1 * time.Second,
BucketID: "0",
DigestCacheTTL: 1 * time.Hour,
Enabled: true,
},
},
{
name: "gradual_rollout_disabled",
configFactory: func(t *testing.T) config.Component {
mockConfig := config.NewMock(t)
mockConfig.SetWithoutSource("site", "datadoghq.com")
mockConfig.SetWithoutSource("api_key", "1234567890abcdef")
mockConfig.SetWithoutSource("admission_controller.auto_instrumentation.gradual_rollout.enabled", false)
return mockConfig
},
expectedState: Config{
Site: "datadoghq.com",
DDRegistries: map[string]struct{}{"gcr.io/datadoghq": {}, "docker.io/datadog": {}, "public.ecr.aws/datadog": {}},
RCClient: nil,
MaxInitRetries: 5,
InitRetryDelay: 1 * time.Second,
BucketID: "0",
DigestCacheTTL: 1 * time.Hour,
Enabled: false,
},
},
{
name: "gradual_rollout_cache_ttl_hours_configured",
configFactory: func(t *testing.T) config.Component {
mockConfig := config.NewMock(t)
mockConfig.SetWithoutSource("site", "datadoghq.com")
mockConfig.SetWithoutSource("api_key", "1234567890abcdef")
mockConfig.SetWithoutSource("admission_controller.auto_instrumentation.gradual_rollout.cache_ttl_hours", 2)
return mockConfig
},
expectedState: Config{
Site: "datadoghq.com",
DDRegistries: map[string]struct{}{"gcr.io/datadoghq": {}, "docker.io/datadog": {}, "public.ecr.aws/datadog": {}},
RCClient: nil,
MaxInitRetries: 5,
InitRetryDelay: 1 * time.Second,
BucketID: "0",
DigestCacheTTL: 2 * time.Hour,
Enabled: true,
},
},
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,6 @@ func (r *rcResolver) Resolve(registry string, repository string, tag string) (*R
defer r.mu.RUnlock()

if len(r.imageMappings) == 0 {
log.Debugf("Cache empty, no resolution available")
Copy link
Contributor Author

@erikayasuda erikayasuda Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so disruptive in reading the failed test logs, I can only imagine it's just as disruptive for real services. Removing 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair - this seems like valid/ "expected" behavior so probably no need to report it.

metrics.ImageResolutionAttempts.Inc(repository, tag, tag)
return nil, false
}
Expand Down Expand Up @@ -275,6 +274,9 @@ func newBucketTagResolver(cfg Config) *bucketTagResolver {
// New creates the appropriate Resolver based on whether
// a remote config client is available.
func New(cfg Config) Resolver {
if !cfg.Enabled {
return NewNoOpResolver()
}
if cfg.RCClient == nil || reflect.ValueOf(cfg.RCClient).IsNil() {
log.Debugf("No remote config client available")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not directly related to these changes, but would be nice to get more specific (about the impact) on this log as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah spoiler alert... I'm going to remove all of this remote config based code in a subsequent PR 😅 So it'll go away in a bit

return NewNoOpResolver()
Expand Down
2 changes: 2 additions & 0 deletions pkg/config/setup/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -962,6 +962,8 @@ func InitConfig(config pkgconfigmodel.Setup) {
"docker.io/datadog",
"public.ecr.aws/datadog",
})
config.BindEnvAndSetDefault("admission_controller.auto_instrumentation.gradual_rollout.enabled", true)
config.BindEnvAndSetDefault("admission_controller.auto_instrumentation.gradual_rollout.cache_ttl_hours", 1)
config.BindEnvAndSetDefault("admission_controller.auto_instrumentation.patcher.enabled", false)
config.BindEnvAndSetDefault("admission_controller.auto_instrumentation.patcher.fallback_to_file_provider", false) // to be enabled only in e2e tests
config.BindEnvAndSetDefault("admission_controller.auto_instrumentation.patcher.file_provider_path", "/etc/datadog-agent/patch/auto-instru.json") // to be used only in e2e tests
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Each section from every release note are combined when the
# CHANGELOG.rst is rendered. So the text needs to be worded so that
# it does not depend on any information only available in another
# section. This may mean repeating some details, but each section
# must be readable independently of the other.
#
# Each section note must be formatted as reStructuredText.
---
enhancements:
- |
Added support for two new configurations for tag-based gradual rollout in K8s SSI deployments.
The gradual rollout can be configured using the following parameters:
- ``DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_GRADUAL_ROLLOUT_ENABLED``: Whether to enable gradual rollout (default: true)
- ``DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_GRADUAL_ROLLOUT_CACHE_TTL_HOURS``: The cache TTL in hours for the gradual rollout image cache (default: 1)
- This cache is used to store the mapping of mutable tags -> image digest for the gradual rollout, and setting this TTL helps prevent the image resolution from becoming stale.
Loading