Skip to content

Conversation

@alexandraoberaigner
Copy link
Contributor

@alexandraoberaigner alexandraoberaigner commented Oct 24, 2025

This PR

This pull request enhances the gRPC synchronization logic in the flagd provider by introducing a mechanism to identify and handle non-retryable gRPC status codes. This ensures that certain errors (like authentication failures) are not retried unnecessarily, improving error handling and resource usage.

Related Issues

improves #744 to fix #756

Notes

Error handling improvements:

  • Added a set of non-retryable gRPC status codes (PermissionDenied, Unauthenticated) and logic to prevent retries when these errors occur during sync cycles. [1] [2]
  • Introduced a helper function initNonRetryableStatusCodesSet to parse and initialize the set of non-retryable codes at startup. [1] [2]

How to test

  • flagd instance with envoy filter to return PERMISSION_DENIED status code
  • run go test app with flagd provider in inprocess mode, request feature flag
  • Verify: instead of retrying, error is printed only once:
image

@alexandraoberaigner alexandraoberaigner requested review from a team as code owners October 24, 2025 11:45
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alexandraoberaigner, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the flagd provider's gRPC synchronization by implementing a more intelligent error handling strategy. It enables the system to recognize and immediately stop retrying for specific gRPC status codes, such as authentication failures, thereby preventing resource waste and improving the overall stability and responsiveness of the synchronization process.

Highlights

  • Non-Retryable gRPC Status Codes: Introduced a mechanism to identify and handle specific non-retryable gRPC status codes, such as 'PermissionDenied' and 'Unauthenticated', preventing unnecessary retries and improving error handling efficiency.
  • Optimized Error Handling: The flagd provider's gRPC synchronization logic now gracefully handles certain errors by immediately ceasing retries, which conserves resources and enhances the stability and responsiveness of the synchronization process.
  • Dependency Updates: Added new imports for github.com/goccy/go-json to facilitate JSON parsing and google.golang.org/grpc/status for robust gRPC error handling.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to prevent retries for certain gRPC status codes in the flagd provider, enhancing error handling and resource utilization. The changes include adding a set of non-retryable status codes, a helper function to initialize this set, and logic to check for these codes during sync cycles. The code changes look good and address the issue of unnecessary retries for non-recoverable errors.

Copy link
Member

@toddbaert toddbaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guidobrei and I fixed a very similar thing in Java recently: open-feature/java-sdk-contrib#1590 (there's some other stuff as well but it's not related). The key things are:

  • this change to follow gRPC's recommendations and reduce the amount of retryable codes (we only had this many because were were using the retry backoff to slow down the loop
  • this change to prevent tight loops when the retry policy doesn't take effect

I think it would be better do implement the same here and transition to a FATAL state (and even add this to the flagd spec).

@alexandraoberaigner
Copy link
Contributor Author

alexandraoberaigner commented Oct 30, 2025

@guidobrei and I fixed a very similar thing in Java recently: open-feature/java-sdk-contrib#1590 (there's some other stuff as well but it's not related). The key things are:

  • this change to follow gRPC's recommendations and reduce the amount of retryable codes (we only had this many because were were using the retry backoff to slow down the loop
  • this change to prevent tight loops when the retry policy doesn't take effect

I think it would be better do implement the same here and transition to a FATAL state (and even add this to the flagd spec).

I implemented your suggestions @toddbaert. Please let me know if you spot anything else 🙏
Note: I also opened a gherkin test PR here: open-feature/flagd-testbed#302 to prevent this issue in all flagd providers; we should probably merge the gherkin PR before hand to test the fix by automatic e2e tests

… non-retry error handling (open-feature#756)

Signed-off-by: Alexandra Oberaigner <[email protected]>
"retryPolicy": {
"MaxAttempts": 3,
"InitialBackoff": "1s",
"MaxBackoff": "5s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the java impl, we set this to the maxBackoff param: https://github.com/open-feature/java-sdk-contrib/pull/1590/files#diff-bbef645a236a67bc95a5f8aa30fa5a528c6b2d45b4f4137b4f4b1074af197f26R57

See: https://flagd.dev/providers/rust/?h=backoff#configuration-options

That way, there's consistency between the gRPC RPC-level retries and our stream cycle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be easier to put this in a small util function in another file along with the nonRetryableCodes var if you do.

}

// Backoff before retrying
time.Sleep(time.Duration(g.RetryGracePeriod))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the FLAGD_RETRY_BACKOFF_MAX_MS param here instead, like we did in Java? I think it's a more sensible setting to use here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants