Commit 18b584f
authored
fix: use next_event_id column as source of truth when reading workflow execution from Cassandra (#7738)
<!-- 1-2 line summary of WHAT changed technically:
- Always link the relevant projects GitHub issue, unless it is a minor
bugfix
- Good: "Modified FailoverDomain mapper to allow null ActiveClusterName
#320"
- Bad: "added nil check" -->
**What changed?**
Read from the denormalized columns next_event_id (that duplicate the
data next_event_id from the execution field) and set to
InternalWorkflowExecutionInfo when reading concrete execution data from
Cassandra.
<!-- Your goal is to provide all the required context for a future
maintainer
to understand the reasons for making this change (see
https://cbea.ms/git-commit/#why-not-how).
How did this work previously (and what was wrong with it)? What has
changed, and why did you solve it
this way?
- Good: "Active-active domains have independent cluster attributes per
region. Previously,
modifying cluster attributes required spedifying the default
ActiveClusterName which
updates the global domain default. This prevents operators from updating
regional
configurations without affecting the primary cluster designation. This
change allows
attribute updates to be independent of active cluster selection."
- Bad: "Improves domain handling" -->
**Why?**
Cassandra stores values from the same row but different columns in
different places on disk, rather than as a single, contiguous row block.
It's possible that the denormalized columns get out of sync with the
execution blob in the execution field. This denormalized column is used
as conditional write when updating the execution record for concrete
workflow executions.
By reading it and setting it on InternalWorkflowExecutionInfo we can
leverage the checksum verification to detect differences between the
denormalized next_event_id column and the next_event_id in the execution
blob field (used to calculate the checksum) and identify corrupt
workflows quicker and with more precision.
<!-- Include specific test commands and setup. Please include the exact
commands such that
another maintainer or contributor can reproduce the test steps taken.
- e.g Unit test commands with exact invocation
`go test -v ./common/types/mapper/proto -run TestFailoverDomainRequest`
- For integration tests include setup steps and test commands
Example: "Started local server with `./cadence start`, then ran `make
test_e2e`"
- For local simulation testing include setup steps for the server and
how you ran the tests
- Good: Full commands that reviewers can copy-paste to verify
- Bad: "Tested locally" or "Added tests" -->
**How did you test it?**
go test ./common/persistence/nosql/nosqlplugin/cassandra -run
Test_parseWorkflowExecutionInfo
<!-- If there are risks that the release engineer should know about
document them here.
For example:
- Has an API/IDL been modified? Is it backwards/forwards compatible? If
not, what are the repecussions?
- Has a schema change been introduced? Is it possible to roll back?
- Has a feature flag been re-used for a new purpose?
- Is there a potential performance concern? Is the change modifying core
task processing logic?
- If truly N/A, you can mark it as such -->
**Potential risks**
We are changing how we read data from Cassandra and if we are doing
incorrectly that could cause workflows to be corrupt/stuck.
We are also changing the argument to the parsing function and we had to
modify/use the whole result instead of passing only the "execution". If
this is wrong it could cause issues in parsing and affect workflows.
<!-- If this PR completes a user facing feature or changes functionality
add release notes here.
Your release notes should allow a user and the release engineer to
understand the changes with little context.
Always ensure that the description contains a link to the relevant
GitHub issue. -->
**Release notes**
Improve workflow corruption detection in Cassandra by reading from
next_event_id denormalized column and checking against checksum in
checksum verification.
<!-- Consider whether this change requires documentation updates in the
Cadence-Docs repo
- If yes: mention what needs updating (or link to docs PR in
cadence-docs repo)
- If in doubt, add a note about potential doc needs
- Only mark N/A if you're certain no docs are affected -->
**Documentation Changes**
---
## Reviewer Validation
**PR Description Quality** (check these before reviewing code):
- [ ] **"What changed"** provides a clear 1-2 line summary
- [ ] Project Issue is linked
- [ ] **"Why"** explains the full motivation with sufficient context
- [ ] **Testing is documented:**
- [ ] Unit test commands are included (with exact `go test` invocation)
- [ ] Integration test setup/commands included (if integration tests
were run)
- [ ] Canary testing details included (if canary was mentioned)
- [ ] **Potential risks** section is thoughtfully filled out (or
legitimately N/A)
- [ ] **Release notes** included if this completes a user-facing feature
- [ ] **Documentation** needs are addressed (or noted if uncertain)
---------
Signed-off-by: fimanishi <fimanishi@gmail.com>1 parent 87ecb5a commit 18b584f
File tree
5 files changed
+225
-112
lines changed- common/persistence/nosql/nosqlplugin/cassandra
5 files changed
+225
-112
lines changedLines changed: 13 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
101 | 104 | | |
102 | 105 | | |
103 | 106 | | |
| |||
205 | 208 | | |
206 | 209 | | |
207 | 210 | | |
208 | | - | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
209 | 215 | | |
210 | 216 | | |
211 | 217 | | |
| |||
357 | 363 | | |
358 | 364 | | |
359 | 365 | | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
360 | 370 | | |
361 | | - | |
| 371 | + | |
362 | 372 | | |
363 | 373 | | |
364 | 374 | | |
| |||
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
312 | 312 | | |
313 | 313 | | |
314 | 314 | | |
315 | | - | |
| 315 | + | |
| 316 | + | |
316 | 317 | | |
317 | 318 | | |
318 | 319 | | |
| |||
Lines changed: 13 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
37 | 38 | | |
38 | 39 | | |
39 | 40 | | |
40 | | - | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
41 | 46 | | |
42 | 47 | | |
43 | 48 | | |
| |||
46 | 51 | | |
47 | 52 | | |
48 | 53 | | |
49 | | - | |
| 54 | + | |
50 | 55 | | |
51 | 56 | | |
52 | 57 | | |
| |||
106 | 111 | | |
107 | 112 | | |
108 | 113 | | |
109 | | - | |
110 | | - | |
111 | 114 | | |
112 | 115 | | |
113 | 116 | | |
| |||
191 | 194 | | |
192 | 195 | | |
193 | 196 | | |
194 | | - | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
195 | 203 | | |
196 | 204 | | |
197 | 205 | | |
| |||
0 commit comments