-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[Prism] Fix data race while accesing processing time queue #36672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @shunping, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical data race within the Prism runner's processing-time timer mechanism. The issue arose from concurrent read and write operations on the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
cc'ed @stankiewicz |
| state := wv[key] | ||
| endOfWindowReached := window.MaxTimestamp() < ss.input | ||
| em.refreshCond.L.Lock() | ||
| emNow := em.ProcessingTimeNow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProcessingTimeNow() is exported... Are we calling it outside of the engine package? If not, please unexport it.
Otherwise we'll need to move the lock holding into the method itself to handle the external caller, and have an internalU unexported version that requires manual locking.
Essentially, we can't have an exported method that requires holding an internal lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem as I mentioned below (#36672 (comment)) is about the order of obtaining two locks (refreshCond.L and ss.mu) is different in the watermark evaluation go routine and the go routine for running PersistBundle.
|
Looks like my fix also caused deadlock:
|
1f2991f to
8b2cac4
Compare
42fdd0c to
49005b2
Compare
To resolve the deadlock mentioned here, we will have to make sure |
|
Assigning reviewers: R: @lostluck for label go. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
lostluck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! The assumption was that the processing time handling would only happen in Stateful stages (with timers), but now it's dealing with triggers. Unfortunate, but at least this was a minor issue.

Data race has been seen when we run some processing-time timer tests.
go test -v -race -count=200 -timeout 100s -run ^TestTimers_ProcessingTime_Infinity$ github.com/apache/beam/sdks/v2/go/test/integration/primitivesMost of the access to
em.processTimeEventsare guarded byem.refreshCond.Llock, except for one location ininjectTriggerBundlesIfReady. This PR fixes the missing spot.Related to #36662
Internal Bug ID: 449405414