Skip to content

Conversation

grokspawn
Copy link
Contributor

Description of the change:
changes the default lifetime of registry cache content from 5m to 30m.

Motivation for the change:
Cache expiry requires new snapshots to be requested from catalogsources, and for combinations of short cache lifetime, # namespaces, and # subscriptions this resulted in a high rate of snapshot request.

This area in the code still suffers from (at least) two issues:

  1. stampeding herd problem when multiple threads issue snapshot requests when a request is outstanding and unfulfilled;
  2. overload problem when a snapshot requests results in an error and is immediately retried

HOWEVER, this 6x reduction in frequency has been enough to make this problem disappear for all practical purposes, and can be pursued independently.

Architectural changes:

Testing remarks:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

@openshift-ci openshift-ci bot requested review from ankitathomas and joelanford June 3, 2025 20:58
@grokspawn grokspawn changed the title reduce cache expiriry frequency reduce cache expiry frequency Jun 3, 2025
@grokspawn grokspawn requested a review from Copilot June 3, 2025 20:58
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR increases the default registry cache expiry from 5 minutes to 30 minutes to reduce load from frequent snapshot requests.

  • Introduce a defaultCacheLifetime duration set to 30 minutes.
  • Update sourceInvalidator.ttl to use the new default cache lifetime.
Comments suppressed due to low confidence (2)

pkg/controller/registry/resolver/source_registry.go:78

  • [nitpick] Consider renaming defaultCacheLifetime to defaultCacheTTL to align with the ttl field name and improve consistency.
var defaultCacheLifetime time.Duration = 30 * time.Minute

pkg/controller/registry/resolver/source_registry.go:87

  • Add or update unit tests to assert that sourceInvalidator.ttl is set to the new 30-minute default to prevent regressions if this value changes again.
ttl:        defaultCacheLifetime,

@grokspawn grokspawn force-pushed the catalog-operator-cache-lifetime branch from 97b90aa to 255d9b5 Compare June 3, 2025 21:02
@grokspawn grokspawn added this pull request to the merge queue Jun 4, 2025
Merged via the queue into operator-framework:master with commit bf9ffe8 Jun 4, 2025
13 checks passed
@grokspawn grokspawn deleted the catalog-operator-cache-lifetime branch June 4, 2025 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants