fix(flagd): do not retry for certain status codes (#756) #783

alexandraoberaigner · 2025-10-24T11:45:10Z

This PR

This pull request enhances the gRPC synchronization logic in the flagd provider by introducing a mechanism to identify and handle non-retryable gRPC status codes. This ensures that certain errors (like authentication failures) are not retried unnecessarily, improving error handling and resource usage.

Related Issues

improves #744 to fix #756

Notes

Error handling improvements:

Added a set of non-retryable gRPC status codes (PermissionDenied, Unauthenticated) and logic to prevent retries when these errors occur during sync cycles. [1] [2]
Introduced a helper function initNonRetryableStatusCodesSet to parse and initialize the set of non-retryable codes at startup. [1] [2]

How to test

flagd instance with envoy filter to return PERMISSION_DENIED status code
run go test app with flagd provider in inprocess mode, request feature flag
Verify: instead of retrying, error is printed only once:

gemini-code-assist · 2025-10-24T11:45:24Z

Summary of Changes

Hello @alexandraoberaigner, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the flagd provider's gRPC synchronization by implementing a more intelligent error handling strategy. It enables the system to recognize and immediately stop retrying for specific gRPC status codes, such as authentication failures, thereby preventing resource waste and improving the overall stability and responsiveness of the synchronization process.

Highlights

Non-Retryable gRPC Status Codes: Introduced a mechanism to identify and handle specific non-retryable gRPC status codes, such as 'PermissionDenied' and 'Unauthenticated', preventing unnecessary retries and improving error handling efficiency.
Optimized Error Handling: The flagd provider's gRPC synchronization logic now gracefully handles certain errors by immediately ceasing retries, which conserves resources and enhances the stability and responsiveness of the synchronization process.
Dependency Updates: Added new imports for github.com/goccy/go-json to facilitate JSON parsing and google.golang.org/grpc/status for robust gRPC error handling.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a mechanism to prevent retries for certain gRPC status codes in the flagd provider, enhancing error handling and resource utilization. The changes include adding a set of non-retryable status codes, a helper function to initialize this set, and logic to check for these codes during sync cycles. The code changes look good and address the issue of unnecessary retries for non-recoverable errors.

providers/flagd/pkg/service/in_process/grpc_sync.go

toddbaert

@guidobrei and I fixed a very similar thing in Java recently: open-feature/java-sdk-contrib#1590 (there's some other stuff as well but it's not related). The key things are:

this change to follow gRPC's recommendations and reduce the amount of retryable codes (we only had this many because were were using the retry backoff to slow down the loop
this change to prevent tight loops when the retry policy doesn't take effect

I think it would be better do implement the same here and transition to a FATAL state (and even add this to the flagd spec).

Signed-off-by: Alexandra Oberaigner <[email protected]>

alexandraoberaigner · 2025-10-30T12:11:28Z

@guidobrei and I fixed a very similar thing in Java recently: open-feature/java-sdk-contrib#1590 (there's some other stuff as well but it's not related). The key things are:

this change to follow gRPC's recommendations and reduce the amount of retryable codes (we only had this many because were were using the retry backoff to slow down the loop

this change to prevent tight loops when the retry policy doesn't take effect

I think it would be better do implement the same here and transition to a FATAL state (and even add this to the flagd spec).

I implemented your suggestions @toddbaert. Please let me know if you spot anything else 🙏
Note: I also opened a gherkin test PR here: open-feature/flagd-testbed#302 to prevent this issue in all flagd providers; we should probably merge the gherkin PR before hand to test the fix by automatic e2e tests

… non-retry error handling (open-feature#756) Signed-off-by: Alexandra Oberaigner <[email protected]>

toddbaert · 2025-10-30T13:22:50Z

providers/flagd/pkg/service/in_process/grpc_sync.go

 			  "retryPolicy": {
 				"MaxAttempts": 3,
 				"InitialBackoff": "1s",
 				"MaxBackoff": "5s",


In the java impl, we set this to the maxBackoff param: https://github.com/open-feature/java-sdk-contrib/pull/1590/files#diff-bbef645a236a67bc95a5f8aa30fa5a528c6b2d45b4f4137b4f4b1074af197f26R57

See: https://flagd.dev/providers/rust/?h=backoff#configuration-options

That way, there's consistency between the gRPC RPC-level retries and our stream cycle.

It may be easier to put this in a small util function in another file along with the nonRetryableCodes var if you do.

toddbaert · 2025-10-30T13:23:55Z

providers/flagd/pkg/service/in_process/grpc_sync.go

+			}
+
+			// Backoff before retrying
+			time.Sleep(time.Duration(g.RetryGracePeriod))


Can you use the FLAGD_RETRY_BACKOFF_MAX_MS param here instead, like we did in Java? I think it's a more sensible setting to use here.

alexandraoberaigner requested review from a team as code owners October 24, 2025 11:45

github-actions bot assigned bacherfl, Kavindu-Dodan and toddbaert Oct 24, 2025

github-actions bot requested review from Kavindu-Dodan, bacherfl and toddbaert October 24, 2025 11:45

gemini-code-assist bot reviewed Oct 24, 2025

View reviewed changes

providers/flagd/pkg/service/in_process/grpc_sync.go Outdated Show resolved Hide resolved

providers/flagd/pkg/service/in_process/grpc_sync.go Show resolved Hide resolved

providers/flagd/pkg/service/in_process/grpc_sync.go Outdated Show resolved Hide resolved

alexandraoberaigner mentioned this pull request Oct 24, 2025

Infinite retry to establish connection to FlagSyncService in Flagd golang provider #756

Open

toddbaert requested changes Oct 24, 2025

View reviewed changes

fix(flagd): do not retry for certain status codes (open-feature#756)

10e72b9

Signed-off-by: Alexandra Oberaigner <[email protected]>

alexandraoberaigner force-pushed the fix/inifinite-loop-error branch from 778197d to 10e72b9 Compare October 28, 2025 12:02

alexandraoberaigner mentioned this pull request Oct 30, 2025

chore: add test to ensure fatal state on forbidden error (#756) open-feature/flagd-testbed#302

Merged

fix(flagd): Add forbidden provider support and improve inprocess sync…

71745fc

… non-retry error handling (open-feature#756) Signed-off-by: Alexandra Oberaigner <[email protected]>

alexandraoberaigner force-pushed the fix/inifinite-loop-error branch from 4457536 to 71745fc Compare October 30, 2025 12:40

toddbaert reviewed Oct 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(flagd): do not retry for certain status codes (#756) #783

fix(flagd): do not retry for certain status codes (#756) #783

alexandraoberaigner commented Oct 24, 2025 •

edited by toddbaert

Loading

Uh oh!

gemini-code-assist bot commented Oct 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

toddbaert left a comment •

edited

Loading

Uh oh!

alexandraoberaigner commented Oct 30, 2025 •

edited

Loading

Uh oh!

toddbaert Oct 30, 2025

Uh oh!

toddbaert Oct 30, 2025

Uh oh!

toddbaert Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix(flagd): do not retry for certain status codes (#756) #783

Are you sure you want to change the base?

fix(flagd): do not retry for certain status codes (#756) #783

Conversation

alexandraoberaigner commented Oct 24, 2025 • edited by toddbaert Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR

Related Issues

Notes

How to test

Uh oh!

gemini-code-assist bot commented Oct 24, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

toddbaert left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexandraoberaigner commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

toddbaert Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

toddbaert Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

toddbaert Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alexandraoberaigner commented Oct 24, 2025 •

edited by toddbaert

Loading

toddbaert left a comment •

edited

Loading

alexandraoberaigner commented Oct 30, 2025 •

edited

Loading