Skip to content

Executor client drain observer#7710

Open
gazi-yestemirova wants to merge 6 commits intocadence-workflow:masterfrom
gazi-yestemirova:executor_client_drain_observer
Open

Executor client drain observer#7710
gazi-yestemirova wants to merge 6 commits intocadence-workflow:masterfrom
gazi-yestemirova:executor_client_drain_observer

Conversation

@gazi-yestemirova
Copy link
Contributor

What changed?

Why?

How did you test it?

Potential risks

Release notes

Documentation Changes


Reviewer Validation

PR Description Quality (check these before reviewing code):

  • "What changed" provides a clear 1-2 line summary
    • Project Issue is linked
  • "Why" explains the full motivation with sufficient context
  • Testing is documented:
    • Unit test commands are included (with exact go test invocation)
    • Integration test setup/commands included (if integration tests were run)
    • Canary testing details included (if canary was mentioned)
  • Potential risks section is thoughtfully filled out (or legitimately N/A)
  • Release notes included if this completes a user-facing feature
  • Documentation needs are addressed (or noted if uncertain)

Signed-off-by: Gaziza Yestemirova <gaziza@uber.com>
Signed-off-by: Gaziza Yestemirova <gaziza@uber.com>
Signed-off-by: Gaziza Yestemirova <gaziza@uber.com>
Signed-off-by: Gaziza Yestemirova <gaziza@uber.com>
Signed-off-by: Gaziza Yestemirova <gaziza@uber.com>
Signed-off-by: Gaziza Yestemirova <gaziza@uber.com>
@gitar-bot
Copy link

gitar-bot bot commented Feb 16, 2026

🔍 CI failure analysis for c809aef: Five CI failures: (1) PR title missing conventional commit prefix, (2) Struct field tag alignment needs formatting fix, (3-5) Three flaky/infrastructure tests unrelated to PR changes (rate limiter, async workflow consumer, integration test timeout).

Issue 1: PR Title Format

The CI check "Validate PR title follows conventional commit format" failed because the PR title does not follow the conventional commit format.

Root Cause

The PR title Executor client drain observer is missing a required prefix. The validation requires: fix, feat, docs, or style.

Details

Based on the PR content (adding drain signal handling to executor clients), this is a new feature. Suggested title:

feat: Drain executors if infra drain signal is received

Reference: https://www.conventionalcommits.org/


Issue 2: Formatting - Struct Field Alignment

The golangci-lint job failed because code formatting changes were not committed.

Root Cause

In service/sharddistributor/client/executorclient/client.go:77-78, struct field tags are not properly aligned. The formatter removed extra spaces:

Current (needs fix):

Metadata      ExecutorMetadata                  `optional:"true"`
DrainObserver clientcommon.DrainSignalObserver   `optional:"true"`

Expected (after formatting):

Metadata      ExecutorMetadata                 `optional:"true"`
DrainObserver clientcommon.DrainSignalObserver `optional:"true"`

Issue 3: Flaky Test - Rate Limiter (Unrelated)

The codecov job failed due to a test failure in common/clock/ratelimiter_test.go with "context deadline exceeded" error. This is unrelated to PR changes - the PR only modifies service/sharddistributor/ files.


Issue 4: Flaky Test - Async Workflow Consumer (Unrelated)

The Golang unit test job failed in service/worker/asyncworkflow/async_workflow_consumer_manager_test.go with consumer count mismatch. This is unrelated to PR changes - the PR only modifies shard distributor files.


Issue 5: Integration Test Timeout (Unrelated)

The Golang integration test with sqlite failed after 11m50s with no specific test failure logged:

FAIL	github.com/uber/cadence/host	665.413s

Root Cause

This is unrelated to PR changes. The PR only modifies service/sharddistributor/ files. The logs show normal shutdown sequences without explicit test failures, suggesting a timeout or infrastructure issue rather than a code defect.

Code Review 👍 Approved with suggestions 0 resolved / 1 findings

Clean implementation of drain/undrain signal handling for executor clients and namespace manager. The namespace manager has comprehensive tests for the new drain logic, but the executor client's drain/undrain path (heartbeatloop drain case and waitForUndrain) still lacks unit test coverage.

💡 Quality: No unit tests for executor client drain/undrain path

📄 service/sharddistributor/client/executorclient/clientimpl.go:224 📄 service/sharddistributor/client/executorclient/clientimpl.go:253

The executor client's drain signal handling in clientimpl.go (the case <-drainCh: branch in heartbeatloop and the waitForUndrain function) has no corresponding unit tests. The existing test file clientimpl_test.go has tests for stop signal and context cancellation draining heartbeats, but nothing exercises:

  1. Drain signal → stopShardProcessors → sendDrainingHeartbeat → waitForUndrain flow
  2. Undrain → resume heartbeat flow
  3. Stop during waitForUndrain
  4. Nil drainObserver (backward compatibility)

The namespace manager has comprehensive drain tests (TestDrainSignal_TriggersResign, TestDrainThenUndrain_ResumesElection, etc.), so it would be good to add equivalent coverage for the executor client. This is especially important since the executor drain path has additional complexity (timer reset, shard processor lifecycle).

Rules ❌ No requirements met

Repository Rules

PR Description Quality Standards: Fill out all PR description sections: What changed? (1-2 line summary), Why? (full context and motivation), How did you test it? (exact test commands), Potential risks (compatibility/performance concerns), Release notes (if user-facing), and Documentation Changes (if docs affected)

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant