Skip to content

Conversation

3vilhamster
Copy link
Contributor

@3vilhamster 3vilhamster commented Nov 8, 2024

What changed?
I've copied over the workaround racy logs in tests from the backend.
We have the same issues with the client who cannot establish a clear stop for internal processes at the first stop. This is not an issue for real workflow usage but can lead to test failures due to racy log messages after the test ends.

Sidenote:
Changed the approach from using atomic to RWMutex, since this situation requires semantic:

  1. We can have concurrent log.Writes - so RLock is not stopping them.
  2. We must wait for all writes to go on Cleanup and then acquire the lock
  3. All writes after the update should go to fallback.

Why?
Improving test stability

How did you test it?
Run unit tests a few hundred times.
This is a bit tricky, since the TestLoggerShouldNotFailIfLoggedLate requires rerun, but I've done for i in {1..100}; do go test . --count 1; done with race and without.

Potential risks
Some tests could be flaky, though they were failing before the fix.

Copy link

codecov bot commented Nov 8, 2024

Codecov Report

Attention: Patch coverage is 95.00000% with 5 lines in your changes missing coverage. Please review.

Project coverage is 81.91%. Comparing base (7a3beaa) to head (e209ff6).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
internal/common/testlogger/testlogger.go 94.56% 4 Missing and 1 partial ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
internal/common/convert.go 100.00% <100.00%> (ø)
internal/common/testlogger/testlogger.go 94.56% <94.56%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ec4386...e209ff6. Read the comment docs.

Copy link
Member

@Groxx Groxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM though I don't see the reason for the added mutex.
If something seemed to be flaky without it, I can pretty much guarantee that it didn't fix it, so there's something else that needs fixing instead.

@Groxx
Copy link
Member

Groxx commented Nov 8, 2024

From some more reading and thinking and IRL chat: I'm pretty sure the mutex doesn't change anything, but it's not risky or anything. So yeah, still approved, merge any time.

Now that there's a counter-example showing this logger isn't reliable, I think it's pretty clear it's just that the atomics aren't ensuring "fallback logger is used after test completes". it currently allows: pausing after checking the atomic is not-completed -> test completes -> resuming and using the T logger.

But I can make a PR for that if you want. The mutex here shouldn't make anything worse, and a mutex will be needed to fix the "atomics are not enough" issue anyway.

@3vilhamster 3vilhamster merged commit b51e891 into cadence-workflow:master Nov 11, 2024
13 checks passed
@3vilhamster 3vilhamster deleted the introduce-testlogger branch November 11, 2024 14:09
mrombout pushed a commit to softsense/cadence-client that referenced this pull request Feb 26, 2025
…nce-workflow#1398)

* [common] Introduce testlogger as a workaround for a poor lifecycle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants