-
Notifications
You must be signed in to change notification settings - Fork 621
groupwithin new implementation #3162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
866212f
groupwithin new implementation
Angel-O 2e23187
removing flaky flag, adding variance annotation
Angel-O dd2f11b
fixing variance annotation
Angel-O b1423d5
revert flaky
Angel-O 33c59a8
covariant state
Angel-O 7207890
minor
Angel-O cb7a88f
Update core/shared/src/main/scala/fs2/Stream.scala
Angel-O 7b4c793
timeout handling improvements
Angel-O b6dcdda
tuple fix
Angel-O 7b1cfb6
ability to backpressure via queue, and propagate on cancellation or e…
Angel-O 6962042
interrupt behaviour change, test improvements
Angel-O e417d17
removing old implementation
Angel-O d676edd
minor improvements and scalafmt
Angel-O 63d0cd7
scalafmt
Angel-O 335fe17
minor
Angel-O 2658db1
tuple fix
Angel-O ffc3f18
preventing early stream termination on timeout
Angel-O 9a90a11
another tuple fix, aargh
Angel-O File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less chance of deadlock is still not as good as no chance of deadlock 😁
I think the current implementation of
groupWithinuses a hack, where it always returns the semaphore, and then attempts to re-acquire it. But there's no way to fix this properly without implementing theIor-basedraceas described in that issue.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was struggling a bit to understand the problem (i.e where the deadlock would occur) until I clicked through a few links from the issue linked above and stumbled across your PR comment here
If I understood your concern correctly the issue is that the semaphore might lose the race, even if it acquired the permits successfully. So in that case we would need to restore them, in order to prevent a deadlock when the
onTimeoutlogic tries to acquire a permit again (that simply won't be there if the semaphore acquired them without releasing them)if that is the case I believe this is not a problem in this implementation because of this logic:
basically if the buffer contains any element by the time the timeout expires we won't attempt to acquire the permit in a blocking fashion in the for-comprehension.
So by definition the deadlock cannot occur or in other words we should be able to rule out the deadlock because if the buffer is empty then we will know for a fact that the competing semaphore (the one in the race) hasn't acquired any permit (i.e. it does not need to release them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking more closely at this! Yes, you've definitely understood the issue.
After looking at your code and thinking about it, I think you are right that it won't deadlock. However, I still suspect that the state can get corrupted. Something like this:
racetimes-out, although the semaphoreacquireNalso succeeds.fs2/core/shared/src/main/scala/fs2/Stream.scala
Lines 1456 to 1458 in d676edd
emitChunkviaonTimeoutandawaitAndEmitNext.fs2/core/shared/src/main/scala/fs2/Stream.scala
Lines 1452 to 1453 in d676edd
fs2/core/shared/src/main/scala/fs2/Stream.scala
Lines 1444 to 1447 in d676edd
fs2/core/shared/src/main/scala/fs2/Stream.scala
Lines 1420 to 1421 in d676edd
buffer, which concurrently gets filled with new stuff and releases permits.fs2/core/shared/src/main/scala/fs2/Stream.scala
Lines 1441 to 1442 in d676edd
tryAcquireNinawaitAndEmitNextsucceeds.fs2/core/shared/src/main/scala/fs2/Stream.scala
Lines 1448 to 1450 in d676edd
2*chunkSizepermits to emitchunkSizeelements.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for breaking this down 🙌🏾
Yes I can see the problem now if the semaphore loses the race, but manages to acquire the permits. We would have a corrupted state and could potentially deadlock/slow down unnecessarily on the next race.
To be honest I didn't know this was a possibility (losing the race but securing the permits), but I think that it could be fixed as follows:
the
edgeCaseval effectively describes the problem you have brought up: if, by the time the timeout expires there are no permits available but the buffer is full then we will know for sure that the semaphore has lost the race but succeeded in acquiring the permits. The fact that the queue isboundedmeans nothing can interfere at this point and change the values of the available supply or the buffer size (correct me if I'm wrong).And it's probably an improvement compared to the hack in the current implementation that resets the count regardless just to reacquire the permits shortly after.
Out of curiosity I've run the benchmarks with the above modification adding a
printlnto see how often this would occur. It did not happen once!! So maybe this is very rare! And perhaps the race being left-biased is also a mitigating factorWhat do you think ? shall I handle the scenario this way ? Or should I have a go and use the
bothOutcome/racePaircombinators mentioned in the issue above?Also what command should I run to rundone 🤞🏾scalafmton all sub-projects ? CI keeps failing due to formatting 😓.I've run
sbt scalafmtbut that didn't work 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@armanbilge I have a possible explanation as to why this wasn't happening: there's a bug in the timeout logic that causes the permits count to be inaccurate
Since the above is not atomic
Then even if the buffer contains one element, the permit count can be zero. This means that the number of permits acquired (
awaited) can be off by one when we acquire them at the end of the for comprehension. As a result the next race will be skewed towards the semaphore.That and the fact that none of the tests have been able to reproduce the scenario.
I've fixed this in the other PR (will push the change sometime today) and I can see the edge case occurring, or at least I've got a test case where this occurs regularly.
The good thing is that the fix suggested seems to be working and can be even simplified to
So far tests look good, will update the other PR, later on today. Closing this for now, since the other one seems to more accurate.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Angel-O to reproduce this, try using
executeEmbedand setting up a scenario where the semaphore gets permits at the exact same time that the timeout expires.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh missed this at first, nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!! Using
TestControl.executeEmbedis a great idea for this! Will give it a go 👍🏾