Skip to content

feat: Add Reason Variable to Failovers#7451

Merged
zawadzkidiana merged 3 commits intocadence-workflow:masterfrom
zawadzkidiana:reasonUpdate
Feb 26, 2026
Merged

feat: Add Reason Variable to Failovers#7451
zawadzkidiana merged 3 commits intocadence-workflow:masterfrom
zawadzkidiana:reasonUpdate

Conversation

@zawadzkidiana
Copy link
Contributor

@zawadzkidiana zawadzkidiana commented Nov 18, 2025

What changed?

  • Added --reason flag (short form -r) to the cadence domain failover command
  • Added Reason field to FailoverDomainRequest and UpdateDomainRequest types
  • Updated the server-side handler to store the reason in FailoverEvent
  • Made sure the reason flows through from CLI to database in the failover history
  • Updated the ToUpdateDomainRequest mapper to pass the reason through

Why?
We needed better visibility into why failovers were happening. Right now when you look at failover history you can see when and where failovers happened, but not why. This makes it hard to debug issues or understand patterns. With this change, operators can provide context like "planned maintenance", "emergency DR", "testing", etc. and it gets stored with the failover event. Makes it way easier to go back and understand what was going on during production incidents.

How did you test it?

  • Verified the CLI flag shows up in help output: ./cadence domain failover --help
  • Ran make bins - everything compiles
  • Ran make lint - passes
  • Ran go test ./tools/cli/... - all CLI tests pass
  • Ran go test ./common/domain/... - all domain handler tests pass
  • Manually checked that the flag accepts strings

The flow from CLI to database is tested through the existing test suite. The reason goes: CLI flag → FailoverDomainRequest → UpdateDomainRequest → FailoverEvent → gets stored in domain data.

Potential risks
Pretty low risk overall since it's backward compatible. The reason field is optional in all the types so old clients can still talk to new servers and vice versa. If you don't provide a reason it just stays empty.

Main thing to watch would be if someone passes in a really long reason string - it goes into domain data which is stored as a map, so there might be size limits there. But that's more of an edge case than a real risk.

Also if someone is relying on the exact format of the FailoverEvent JSON in domain data, this adds a new field. But since it's optional and has the omitempty tag, it shouldn't break anything.

Release notes

Scope is type system, CLI, and domain handler, but the thrift and proto type mappers for FailoverDomainRequest are not updated.

Added optional --reason flag to domain failover command. This lets you specify why a failover is happening and the reason gets stored in the failover history. No migration or schema changes needed.

Documentation Changes
hould probably update the CLI docs to mention the new --reason flag and show some examples of how to use it. The flag shows up in --help but would be good to have it in the actual documentation too.

Copy link
Member

@timl3136 timl3136 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, maybe we can make the pr title more descriptive.

@zawadzkidiana zawadzkidiana changed the title feat: Reason update feat: Add Reason Variable to Failovers Nov 18, 2025
@zawadzkidiana
Copy link
Contributor Author

Re-running 1 failed Golang integratiuon test with sqlite (pull_request); all other tests pass.

@zawadzkidiana
Copy link
Contributor Author

Github outage is likely cause of sqlite integration error; will re-run test again after problem is resolved.

@zawadzkidiana zawadzkidiana self-assigned this Nov 18, 2025
- Add --reason flag to 'cadence domain failover' command with default value 'default maintenance'
- Add --reason flag to 'cadence admin cluster failover start' command
- Extend FailoverParams and FailoverActivityParams to propagate reason through workflow
- Update FailoverActivity to use FailoverDomain API when reason is provided for proper tracking
- Add Reason field to FailoverDomainRequest type definition
- Maintain backward compatibility by falling back to UpdateDomain when no reason provided

This allows operators to provide context for failover operations which will be
stored in the failover history for better operational transparency and debugging.

Signed-off-by: Diana Zawadzki <dzawa@live.de>
- Add FailoverReason field to UpdateDomainRequest type
- Update ToUpdateDomainRequest mapper to include reason field
- Add Reason field to FailoverEvent struct for history storage
- Update NewFailoverEvent to accept and store reason parameter
- Pass reason through handleFailoverRequest to failover history
- Update admin failover tests to expect reason in workflow input

This completes the end-to-end flow for tracking failover reasons from CLI
through to database storage in the failover history.

Signed-off-by: Diana Zawadzki <dzawa@live.de>
- Remove reason field from admin cluster failover workflow (as per reviewer suggestion)
- Remove Reason from FailoverParams and FailoverActivityParams
- Revert FailoverActivity to use UpdateDomain (original implementation)
- Remove --reason flag from admin cluster failover start command
- Change --reason default value from "default maintenance" to empty string
- Update tests to remove reason from workflow input expectations

Only the domain failover command now has the --reason flag, allowing users
to optionally provide context without enforcing a default.

Signed-off-by: Diana Zawadzki <dzawa@live.de>
@gitar-bot
Copy link

gitar-bot bot commented Feb 26, 2026

Code Review ⚠️ Changes requested 0 resolved / 1 findings

The Reason field is correctly added to the type system, CLI, and domain handler, but the thrift and proto type mappers for FailoverDomainRequest are not updated — the reason will be silently dropped when requests go through RPC, which is the standard production path.

⚠️ Bug: Reason field missing from thrift and proto type mappers

📄 common/types/shared.go:1790

The new Reason field added to FailoverDomainRequest in common/types/shared.go is not mapped in the thrift or proto type mapper layers:

  1. Thrift mapper (common/types/mapper/thrift/shared.go lines 1618-1639): Both FromFailoverDomainRequest and ToFailoverDomainRequest only map DomainName, DomainActiveClusterName, and ActiveClusters — the new Reason field is silently dropped.

  2. Proto mapper (common/types/mapper/proto/api.go lines 4526-4546): Same issue — Reason is not included in either direction.

Impact: When a client sends a failover request over RPC (the normal production path), the reason will be lost. The CLI calls frontendClient.FailoverDomain() which goes through thrift serialization. The thrift wrapper at service/frontend/wrappers/thrift/api_generated.go calls thrift.ToFailoverDomainRequest() to convert the incoming request, which drops the Reason field. The feature will only work when the CLI uses the direct domainHandler path (when frontendClient is nil), which is not the standard deployment mode.

Both mapper files need to be updated to include Reason in the FailoverDomainRequest field mapping in both directions (To/From).

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@zawadzkidiana zawadzkidiana enabled auto-merge (squash) February 26, 2026 19:48
@zawadzkidiana zawadzkidiana merged commit 7e31050 into cadence-workflow:master Feb 26, 2026
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants