fix(flagd): do not retry for certain status codes (#756) by alexandraoberaigner · Pull Request #799 · open-feature/go-sdk-contrib

alexandraoberaigner · 2025-11-13T09:15:47Z

This PR

This pull request introduces configurable retry and fatal error handling for the in-process gRPC sync provider in the flagd project. The main changes include adding new configuration options for retry backoff timing and fatal status codes, refactoring environment variable parsing, and updating service initialization and error handling logic.

Related Issues

Reopens the changes from fix(flagd): do not retry for certain status codes (#756) #783
Fixes Infinite retry to establish connection to FlagSyncService in Flagd golang provider #756

Changes

Retry and fatal error configuration (most important):

Added new configuration options (RetryBackoffMs, RetryBackoffMaxMs, FatalStatusCodes) to ProviderConfiguration, with environment variable support and helper functions for parsing integer values from environment variables. [1] [2] [3] [4] [5] [6]
Updated provider and service initialization to pass the new retry and fatal error configuration fields, enabling customization of retry timing and fatal error handling for sync streams. [1] [2] [3]

Refactoring and code quality:

Refactored environment variable parsing for integer values using a new helper function, simplifying and unifying logic for multiple configuration fields. [1] [2] [3]
Moved gRPC retry policy construction and fatal status code normalization to a new file grpc_config.go, making the code more modular and testable. [1] [2] [3] [4] [5]

Error handling improvements:

Enhanced sync error handling to detect fatal gRPC status codes and transition the provider to a fatal state, preventing endless retries on unrecoverable errors.
Updated test coverage for retry policy construction, fatal status code normalization, and camel-case conversion logic in grpc_config_test.go.

aepfli · 2025-11-14T08:52:04Z

As we are adding new config options, we should wait for open-feature/flagd-testbed#311 to be merged to ensure property names are in consistent for all the providers based on the docs.

tests/flagd/testframework/utils.go

providers/flagd/pkg/service/in_process/grpc_config.go

providers/flagd/pkg/service/in_process/grpc_sync.go

providers/flagd/pkg/provider.go

providers/flagd/pkg/service/in_process/grpc_sync.go

aepfli · 2025-11-26T12:44:44Z

[Q] does someone have an idea whats wrong with the DCO?

Commit sha: 2eeebc3, Author: Alexandra Oberaigner, Committer: alexandraoberaigner; Expected "Alexandra Oberaigner alexandra.oberaigner@dynatrace.com", but got "Alexandra Oberaigner alexandra.oberaigner@dynatrace.com".

i am not sure, but worst case the how-to-fix section in https://github.com/open-feature/go-sdk-contrib/pull/799/checks?check_run_id=56445883813 can be helpful and should fix this ;)

aepfli

I think this pullrequest merges two features, fatalErrorCodes and backoff - to keep the changes distinct i suggest separating them into two different pull requests (does not mean they will not be released within one changeset) as they can be also delivered separately.

Furthermore we should rethink our sleeps as I think this is not good practice and there are alternatives, I also created an improvement for the java provider for this.

aepfli · 2025-11-26T13:16:30Z

providers/flagd/pkg/service/in_process/grpc_sync.go

+			}
+
+			// Backoff before retrying
+			time.Sleep(time.Duration(g.RetryBackOffMaxMs) * time.Millisecond)


I am not a big fan of our blocking sleeps, as they clearly have some disadvantages, should we maybe stick to a timer for this kind of logic? like

select { case <-time.After(time.Duration(g.RetryBackOffMaxMs) * time.Millisecond): // ... code here ... case <-ctx.Done(): return // Allows cancellation }

There's disadvantages of sleep, but I think right now this is better than nothing, because nothing == a tight loop in some cases that is a serious bug in some situations.

alexandraoberaigner · 2025-11-26T15:59:35Z

I think this pullrequest merges two features, fatalErrorCodes and backoff - to keep the changes distinct i suggest separating them into two different pull requests (does not mean they will not be released within one changeset) as they can be also delivered separately.

Pls consider my comment above :)

Furthermore we should rethink our sleeps as I think this is not good practice and there are alternatives, I also created an improvement for the java provider for this.

We can do an improvement issue for golang too -> this is just a bug fix / consistency PR

toddbaert

This implementation looks good to me, nice work overall. One thing I think we need in addition is that we should use the same FATAL codes in RPC mode - so I think you will have to add something similar to what you have done in pkg/service/rpc/service.go... do you agree? Since both modes have streams, I think it makes sense for both streams to use this rule.

With respect to @aepfli and @guidobrei 's comment about separating things... I can go either way, but you will need 1 more approval besides mine and I think it might make it easier for you to debug the e2e CI failure.

Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>

aepfli

thank you, looks good to me, one little nit, but nothing blocking this pr from getting merged

providers/flagd/pkg/service/in_process/service.go

Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>

alexandraoberaigner · 2025-11-28T11:56:38Z

[Q] does someone have an idea why the gherkin tests for SYNC_PORT still fail even though I excluded them with ~@sync-port

Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>

Signed-off-by: Todd Baert <todd.baert@dynatrace.com>

toddbaert · 2025-11-28T17:20:02Z

[Q] does someone have an idea why the gherkin tests for SYNC_PORT still fail even though I excluded them with ~@sync-port

@alexandraoberaigner I pushed this.

Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>

alexandraoberaigner · 2025-12-01T10:22:01Z

the retryBackoff changes have been removed from this PR as suggested by @guidobrei @aepfli - I will open a separate PR soon

toddbaert · 2025-12-01T15:45:30Z

I'll merge tomorrow unless I hear objections cc @guidobrei

guidobrei

LGTM ❤️

github-actions bot assigned bacherfl, Kavindu-Dodan and toddbaert Nov 13, 2025

github-actions bot requested review from Kavindu-Dodan, bacherfl and toddbaert November 13, 2025 09:16

alexandraoberaigner force-pushed the fix/inifinite-loop-error branch 4 times, most recently from a2312a3 to a25f7be Compare November 14, 2025 07:39

alexandraoberaigner marked this pull request as ready for review November 14, 2025 07:45

alexandraoberaigner requested review from a team as code owners November 14, 2025 07:45

alexandraoberaigner mentioned this pull request Nov 14, 2025

Infinite retry to establish connection to FlagSyncService in Flagd golang provider #756

Closed

aepfli reviewed Nov 17, 2025

View reviewed changes

tests/flagd/testframework/utils.go Outdated Show resolved Hide resolved

guidobrei suggested changes Nov 21, 2025

View reviewed changes

alexandraoberaigner commented Nov 26, 2025

View reviewed changes

providers/flagd/pkg/service/in_process/grpc_sync.go Show resolved Hide resolved

alexandraoberaigner force-pushed the fix/inifinite-loop-error branch from 3035374 to c8e8e10 Compare November 26, 2025 12:27

This comment was marked as resolved.

Sign in to view

alexandraoberaigner requested review from aepfli and guidobrei November 26, 2025 12:42

alexandraoberaigner force-pushed the fix/inifinite-loop-error branch from bf2e6f2 to 56800b2 Compare November 26, 2025 12:55

aepfli reviewed Nov 26, 2025

View reviewed changes

toddbaert reviewed Nov 26, 2025

View reviewed changes

fix: return fatal on certain error codes during first stream cycle

8ff44da

Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>

alexandraoberaigner force-pushed the fix/inifinite-loop-error branch from 7ab3ce0 to 8ff44da Compare November 27, 2025 14:59

aepfli approved these changes Nov 28, 2025

View reviewed changes

providers/flagd/pkg/service/in_process/service.go Outdated Show resolved Hide resolved

update testbed, remove PR-split leftovers

f531ab0

Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>

alexandraoberaigner force-pushed the fix/inifinite-loop-error branch from 12b55af to f531ab0 Compare November 28, 2025 11:41

alexandraoberaigner and others added 2 commits November 28, 2025 13:07

extend timeout for e2e tests

9730a43

Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>

chore: Add sync-port exclusions for now

7d6fff6

Signed-off-by: Todd Baert <todd.baert@dynatrace.com>

toddbaert force-pushed the fix/inifinite-loop-error branch from 6e45d12 to 7d6fff6 Compare November 28, 2025 17:07

toddbaert self-requested a review November 28, 2025 17:19

Merge branch 'main' into fix/inifinite-loop-error

abd23e1

toddbaert approved these changes Nov 28, 2025

View reviewed changes

fix fatal codes default parsing

929f924

Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com>

alexandraoberaigner mentioned this pull request Dec 1, 2025

fix(flagd): configurable retry backoff after each sync cycle error (#756) #806

Merged

guidobrei approved these changes Dec 5, 2025

View reviewed changes

toddbaert merged commit e01a99e into open-feature:main Dec 5, 2025
5 checks passed

This was referenced Dec 5, 2025

chore(main): release providers/flagd 0.3.2 #808

Merged

chore(main): release tests/flagd 1.7.0 #809

Merged

toddbaert mentioned this pull request Dec 11, 2025

[flagd] add FATAL status codes option open-feature/js-sdk-contrib#1423

Closed

Conversation

alexandraoberaigner commented Nov 13, 2025

This PR

Related Issues

Changes

Retry and fatal error configuration (most important):

Refactoring and code quality:

Error handling improvements:

Uh oh!

aepfli commented Nov 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

aepfli commented Nov 26, 2025

Uh oh!

aepfli left a comment

Choose a reason for hiding this comment

Uh oh!

aepfli Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

toddbaert Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

alexandraoberaigner commented Nov 26, 2025

Uh oh!

toddbaert left a comment

Choose a reason for hiding this comment

Uh oh!

aepfli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alexandraoberaigner commented Nov 28, 2025

Uh oh!

toddbaert commented Nov 28, 2025

Uh oh!

alexandraoberaigner commented Dec 1, 2025

Uh oh!

toddbaert commented Dec 1, 2025

Uh oh!

guidobrei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants