Skip to content

feat(frontend): support autoforwarding through headers#7681

Merged
shijiesheng merged 5 commits intocadence-workflow:masterfrom
shijiesheng:feat-query-header
Feb 25, 2026
Merged

feat(frontend): support autoforwarding through headers#7681
shijiesheng merged 5 commits intocadence-workflow:masterfrom
shijiesheng:feat-query-header

Conversation

@shijiesheng
Copy link
Member

@shijiesheng shijiesheng commented Feb 5, 2026

What changed?

add client feature flag support on server to enable autoforwarding

ref #7707

For ordering,
Payload > Header > default Eventual consistency level

  • IDL update to support serialization in proto
  • modified cluster redirection wrapper to honor consistency level from feature flags

Why?

Allows client workers to poll and process tasks from all regions. This is particularly useful when a test infra started a test container that's trying to process tasks during the test.

Before: you'll need to ensure the test container is started in region that's connected to domain's active cluster
Now: all test containers can be auto forwarded to active cluster if they provide the client-feature-flag header with auto forwarding enabled

How did you test it?

Unit Test
(WIP) Testing xdc clusters with cadence-samples to do xdc processing

Potential risks
The header is passed by clients. If they pass it wrong, inactive workers will start processing workflows unexpectedly.

Release notes
added new field in structured client-feature-flag header to allow users to set autoforwarding

Documentation Changes
N/A


Reviewer Validation

PR Description Quality (check these before reviewing code):

  • "What changed" provides a clear 1-2 line summary
    • Project Issue is linked
  • "Why" explains the full motivation with sufficient context
  • Testing is documented:
    • Unit test commands are included (with exact go test invocation)
    • Integration test setup/commands included (if integration tests were run)
    • Canary testing details included (if canary was mentioned)
  • Potential risks section is thoughtfully filled out (or legitimately N/A)
  • Release notes included if this completes a user-facing feature
  • Documentation needs are addressed (or noted if uncertain)

Signed-off-by: Shijie Sheng <liouvetren@gmail.com>
Signed-off-by: Shijie Sheng <liouvetren@gmail.com>
Signed-off-by: Shijie Sheng <liouvetren@gmail.com>
Signed-off-by: Shijie Sheng <liouvetren@gmail.com>
Signed-off-by: Shijie Sheng <liouvetren@gmail.com>
@gitar-bot
Copy link

gitar-bot bot commented Feb 25, 2026

🔍 CI failure analysis for ea8934b: Integration test suite timed out or failed due to infrastructure issues after running for 10+ minutes. No specific test failure message. Unrelated to PR which only modifies frontend cluster redirection wrappers.

Issue

Golang Integration Test with SQLite Failure (Job 64906385897)

  • Test suite: github.com/uber/cadence/host
  • Duration: 621.996s (~10 minutes)
  • Exit code: Test failure without specific test case failure message

Root Cause

This appears to be an infrastructure/timeout issue unrelated to the PR changes.

The integration test suite ran for over 10 minutes and failed with a generic FAIL message, but without any specific test case failure:

FAIL
FAIL	github.com/uber/cadence/host	621.996s
FAIL
make: *** [Makefile:711: cover_integration_profile] Error 1

Analysis

No Test Failure Details:

  • The logs show normal test execution and cleanup
  • No --- FAIL: TestName messages indicating which test failed
  • No panic or assertion failures
  • Tests appear to have timed out or been terminated externally

Infrastructure Indicators:

  • Total duration: 621 seconds (close to common CI timeout limits of 10-11 minutes)
  • Log shows normal shutdown sequences for all services (matching, history, frontend)
  • Final error: "Final attempt failed. Child_process exited with error code 2"
  • Coverage report shows: 25.3% coverage

Details

Why This Is Unrelated to PR:

  1. No Service Logic Changes: This PR modifies only frontend service wrappers:

    • service/frontend/templates/clusterredirection.tmpl
    • service/frontend/wrappers/clusterredirection/api_generated.go
    • service/frontend/wrappers/clusterredirection/utils.go
    • service/frontend/wrappers/clusterredirection/utils_test.go
  2. Isolated Scope: Changes are limited to cluster redirection logic for reading feature flags

  3. No Integration Test Changes: No test files or test infrastructure modified

  4. Timeout Pattern: The failure pattern (long runtime, generic failure, no specific error) is characteristic of:

    • CI infrastructure timeouts
    • Resource exhaustion
    • Flaky infrastructure issues
    • Not code logic failures

Previous Analysis (Still on Record)

Linting failures in commit 3d68872 were fixed in commit ea8934b ("fmt").

Code Review 👍 Approved with suggestions 3 resolved / 4 findings

Clean implementation of header-driven autoforwarding with proper nil-safety. Only the missing license header in utils.go remains unresolved.

💡 Quality: Missing license header in utils.go

📄 service/frontend/wrappers/clusterredirection/utils.go

The new utils.go file is missing the standard Apache 2.0 license header that other files in the codebase have. The corresponding test file utils_test.go includes the proper license header (lines 1-19), but utils.go jumps straight to the package declaration.

Impact: This is a minor consistency issue that should be addressed to maintain codebase standards.

Suggested fix: Add the Apache 2.0 license header to the top of utils.go:

// Copyright (c) 2017 Uber Technologies, Inc.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.

package clusterredirection
✅ 3 resolved
Bug: Tests reference non-existent constant and don't test actual implementation

📄 service/frontend/wrappers/clusterredirection/utils_test.go:29 📄 service/frontend/wrappers/clusterredirection/utils_test.go:46 📄 service/frontend/wrappers/clusterredirection/utils.go:11
The test file utils_test.go references common.QueryConsistencyLevelHeaderName which does not exist as a constant in common/headers.go or anywhere else in the codebase. This means the tests won't compile.

Additionally, even if the constant existed, the tests fundamentally don't test the actual implementation. The tests set context values via context.WithValue(ctx, common.QueryConsistencyLevelHeaderName, ...), but the actual getRequestedConsistencyLevelFromContext() implementation reads from YARPC headers via yarpc.CallFromContext(ctx)client.GetFeatureFlagsFromHeader(call)featureFlags.AutoforwardingEnabled. The context value mechanism used by the tests is completely unrelated to the YARPC header extraction used by the implementation.

The tests need to be rewritten to:

  1. Use a proper YARPC context with headers set (similar to how other tests in the codebase work)
  2. Set the cadence-client-feature-flags header with JSON-encoded FeatureFlags{AutoforwardingEnabled: true}
  3. Assert that the function correctly returns STRONG when the flag is set
Bug: Unused imports fmt and middleware will cause build failure

📄 common/rpc/middleware.go:26 📄 common/rpc/middleware.go:31
The diff adds two new imports to common/rpc/middleware.go:

  • "fmt" (line 26)
  • "go.uber.org/yarpc/api/middleware" (line 31)

Neither of these imports is used anywhere in the file. In Go, unused imports are compilation errors. This suggests that a middleware implementation was removed or not included in this PR (possibly QueryConsistencyLevelInboundMiddleware that was referenced in the PR description). These imports need to be removed, or the corresponding code that uses them needs to be included.

Edge Case: No nil check on yarpc.CallFromContext result risks panic

📄 service/frontend/wrappers/clusterredirection/utils.go:12
getRequestedConsistencyLevelFromContext() calls yarpc.CallFromContext(ctx) and immediately passes the result to client.GetFeatureFlagsFromHeader(call), which dereferences call without a nil check (call.Header(...) on line 125 of versionChecker.go).

yarpc.CallFromContext returns nil when the context doesn't contain a YARPC call (e.g., internal calls, tests, or any non-YARPC context). The existing codebase explicitly checks for nil in similar usages:

  • middleware.go:137: if yarpc.CallFromContext(ctx) == nil
  • middleware.go:180: if inboundCall := yarpc.CallFromContext(ctx); inboundCall != nil

While in the normal production YARPC handler path the call will be non-nil, defensive programming is warranted here since this function is called from 33+ generated API methods and could be called in test or internal contexts.

Rules ⚠️ 1/2 requirements met

Repository Rules

PR Description Quality Standards: Add exact test command 'go test -v ./service/frontend/wrappers/clusterredirection -run TestGetRequestedConsistencyLevelFromContext' to How to test section
GitHub Issue Linking Requirement: PR description contains valid issue link to https://github.com//issues/7707

1 rule not applicable. Show all rules by commenting gitar display:verbose.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@shijiesheng shijiesheng merged commit 63d68a2 into cadence-workflow:master Feb 25, 2026
58 of 59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants