Skip to content

fix(flagd): configurable retry backoff after each sync cycle error (#756)#806

Merged
toddbaert merged 8 commits intoopen-feature:mainfrom
open-feature-forking:fix/retry-backoff
Jan 8, 2026
Merged

fix(flagd): configurable retry backoff after each sync cycle error (#756)#806
toddbaert merged 8 commits intoopen-feature:mainfrom
open-feature-forking:fix/retry-backoff

Conversation

@alexandraoberaigner
Copy link
Contributor

@alexandraoberaigner alexandraoberaigner commented Dec 1, 2025

This PR

adds configurable retry backoff options to the flagd in-process provider, which allows to configure the retry policy of gRPC connections.

Aims to make the implementation consistent with Java. See this change to prevent tight loops when the retry policy doesn't take effect

Related Issues

Details

Configuration and API Enhancements:

  • Added RetryBackoffMs and RetryBackoffMaxMs to ProviderConfiguration, with corresponding default values, environment variable support, and provider options (WithRetryBackoffMs, WithRetryBackoffMaxMs). These allow users to configure retry backoff durations for connection retries. [1] [2] [3] [4] [5] [6]
  • Updated the in-process sync and service layers to accept and use the new retry backoff configuration values, passing them through all relevant constructors and methods. [1] [2] [3] [4]

Retry Logic and Policy:

  • Modified the gRPC retry policy builder to use the configured backoff values for InitialBackoff and MaxBackoff instead of hardcoded durations, making retry behavior fully configurable.
  • Updated the sync retry logic to log and sleep for the configured maximum backoff duration after a failed sync cycle, improving observability and control.

Testing and Validation:

  • Added a new unit test (TestBuildRetryPolicy) for the retry policy builder, verifying that the generated JSON policy reflects the configured backoff values and other retry parameters.
  • Updated integration and unit tests to include the new retry backoff options, ensuring end-to-end coverage. [1] [2] [3] [4] [5]

@alexandraoberaigner

This comment was marked as outdated.

Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>
Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>
Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>
Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>
…t behaviour to Java flagd provider

Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>
Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>
@alexandraoberaigner alexandraoberaigner marked this pull request as ready for review December 9, 2025 14:17
@alexandraoberaigner alexandraoberaigner requested review from a team as code owners December 9, 2025 14:17
Copy link
Member

@aepfli aepfli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not the biggest fan of a sleep, as this will block everything. even if do some kind of context cancellation and want to stop execution. we are stopped by a pretty long Sleep. Hence i think we should really try to use a different construct than a sleep.

The rest of the pull request looks good to me, and is ready to merge.

Disclaimer: For me the sleep is a reason to not approving this, but not a big enough reason to block this pull request from getting merged, if the rest of the approvers are fine with this.

Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>
}

// WithRetryBackoffMs sets the initial backoff duration (in milliseconds) for retrying failed connections
func WithRetryBackoffMs(retryBackoffMs int) ProviderOption {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A parameter that represents a duration should be of type time.Duration. The name of the parameter should be changed to remove any reference to a specific unit like milliseconds.

}
}

cfg.RetryGracePeriod = getIntFromEnvVarOrDefault(flagdGracePeriodVariableName, defaultGracePeriod, cfg.log)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use time.ParseDuration to parse durations.

@toddbaert
Copy link
Member

toddbaert commented Dec 10, 2025

I am still not the biggest fan of a sleep, as this will block everything. even if do some kind of context cancellation and want to stop execution. we are stopped by a pretty long Sleep. Hence i think we should really try to use a different construct than a sleep.

The rest of the pull request looks good to me, and is ready to merge.

Disclaimer: For me the sleep is a reason to not approving this, but not a big enough reason to block this pull request from getting merged, if the rest of the approvers are fine with this.

Ya, personally I'm fine with it. We already have the same pattern in Java, so I would go with it for now. This sleep is only during what's already an unusual error scenario (connection is healthy, but stream keeps returning errors immediately), within a retry loop, so I think it's acceptable.... but looks like @alexandraoberaigner already used a timer.

Copy link
Member

@toddbaert toddbaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a recommended comment for why we are doing our own weird simple backoff for unusual situations. I also agree with @sahidvelji 's duration comment.

Otherwise LGTM.

Co-authored-by: Todd Baert <todd.baert@dynatrace.com>
Signed-off-by: alexandraoberaigner <82218944+alexandraoberaigner@users.noreply.github.com>
@alexandraoberaigner
Copy link
Contributor Author

@toddbaert @sahidvelji I looked into this a bit & would like to change to time.Duration in a separate PR since it seems like its a separate thing to consistently use it - except for the environment variable names, those we cannot change so easily since its also implemented like this in other languages, which might not have such a representation. I can create an issue if you agree.

@alexandraoberaigner
Copy link
Contributor Author

I created #816

@toddbaert toddbaert merged commit 0791fa1 into open-feature:main Jan 8, 2026
6 of 7 checks passed
@toddbaert toddbaert deleted the fix/retry-backoff branch January 9, 2026 13:54
Leoff00 pushed a commit to Leoff00/go-sdk-contrib that referenced this pull request Jan 17, 2026
…pen-feature#756) (open-feature#806)

Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>
Signed-off-by: alexandraoberaigner <82218944+alexandraoberaigner@users.noreply.github.com>
Co-authored-by: Todd Baert <todd.baert@dynatrace.com>
Signed-off-by: leoff00 <leof0180@gmail.com>
Leoff00 pushed a commit to Leoff00/go-sdk-contrib that referenced this pull request Jan 17, 2026
…pen-feature#756) (open-feature#806)

Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>
Signed-off-by: alexandraoberaigner <82218944+alexandraoberaigner@users.noreply.github.com>
Co-authored-by: Todd Baert <todd.baert@dynatrace.com>
Signed-off-by: leoff00 <leof0180@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants