Fix deadlock when accessing @Shared in a Dependency#153
Fix deadlock when accessing @Shared in a Dependency#153stephencelis merged 5 commits intopointfreeco:mainfrom
@Shared in a Dependency#153Conversation
3fcc383 to
011ad74
Compare
| } | ||
| } | ||
|
|
||
| final class _ManagedReference<Key: SharedReaderKey>: Reference, Observable { |
There was a problem hiding this comment.
@pyrtsa Thanks for the PR and for looking into this! It's been a little while since we worked on this, but I believe our worry at the time was that two shared references could compete and initialize the same persistent reference at the same time, and that they would hold unique references while only a single one raced to live in the global persistent references store.
When looking into this did you conclude that this is not possible due to the existing locking?
I also wonder if it's possible to cook up a test case that deadlocks on main and is fixed in this branch?
There was a problem hiding this comment.
I wrote it so that while two or more threads can indeed start setting up the shared reference, only one of them will succeed (while holding the lock another time), and all others will simply drop their half-initialised reference and grab the one already there.
There was a problem hiding this comment.
I need to think a bit on the failing/fixing unit test case, maybe it's possible with some artificial delays. 🤔
There was a problem hiding this comment.
There's a unit test now.
I tried to make the CI run the commit with just the failing test on top of main, but GHA got stuck somehow. (Edit: Maybe the GHA macOS runner breaks down with this test; see #157?)
Anyway, you can check out d188bfb and run tests manually to see that it indeed deadlocks.
0e9f40f to
be056e0
Compare
...by not holding locks while accessing `Dependency` logic with locks of its own. Also simplify `PersistentReferences` implementation by maintaining weak references to `_PersistentReference<Key>` objects directly, without custom reference counting. Fixes pointfreeco#149.
be056e0 to
d188bfb
Compare
| ) rethrows -> _PersistentReference<Key> { | ||
| guard let reference = lock.withLock({ (storage[key.id] as? Weak<Key>)?.reference }) else { | ||
| let value = try value() | ||
| return withExtendedLifetime( |
There was a problem hiding this comment.
@pyrtsa Can you explain why you added this? The tests seem to pass without it.
There was a problem hiding this comment.
Hmm, I think I was being overly pedantic there, and it can be simplified down to:
let reference = _PersistentReference(
key: key,
value: value,
skipInitialLoad: skipInitialLoad
)
return lock.withLock {
if let reference = (storage[key.id] as? Weak<Key>)?.reference {
return reference
} else {
storage[key.id] = Weak(reference: reference)
reference.onDeinit = { [self] in
removeReference(forKey: key)
}
return reference
}
}I originally added withExtendedLifetime in order to guarantee that the captured local _PersistentReference instance outlives the lock no matter what, because before this PR _PersistentReference used to access @Dependency(PersistentReferences.self) in its deinit, potentially causing the same deadlock (in even more subtle circumstances).
I wasn't sure if there was a possibility some other code path could access another @Dependency somewhere else. Reading code closer now, _PersistentReference still has quite a few members which has or could indirectly have a non-trivial deinit (_$perceptionRegistrar, key, subject, _loadError, _saveError, subscription at least), but neither seems currently likely to use @Dependency in their deinit.
But even if that was the case, it seems the current Swift compiler retains the captured-but-unused local variable reference longer than the capturing closure passed to lock.withLock { ... } even without explicitly extending its lifetime. So I guess we can indeed simplify it.
There was a problem hiding this comment.
Having said that, I think it would be good to next check whether swift-dependencies could be updated not to hold the CachedValues.lock while running user-defined code. That would also remove many concerns and potential future regressions causing deadlocks.
There was a problem hiding this comment.
Pushed out a fix, and also moved the happy path (just return reference) to the top with an if/else, as the guard/else seemed to make the logic harder to follow with such a long else branch.
|
Thanks again for investigating this and getting a fix together! |
|
@pyrtsa Sadly it does look like the deadlock test is flakey: https://github.com/pointfreeco/swift-sharing/actions/runs/14910678350/job/41883976560 I'm unable to make it fail locally, so it may be resource constrained environments like GitHub Actions, but I was wondering if you had any other ideas? |
Let's just make the delays a bit longer. They're in the deciseconds currently, maybe the deadlock detector could wait 1–2 secs instead? |
|
@pyrtsa Sounds good to me! Have time for a quick PR? |
|
Can look into it in a couple of hours I think. |
|
@stephencelis Could you take a look if #159 would work, maybe? If not, we might need to tweak the deadlock test to run in a suite or even test target of its own, completely isolated from any other parallel tests limiting its concurrency. |
) * Add failing test for persistence locking order w.r.t. Dependencies * Avoid deadlock when accessing `@Shared` in a `@Dependency` ...by not holding locks while accessing `Dependency` logic with locks of its own. Also simplify `PersistentReferences` implementation by maintaining weak references to `_PersistentReference<Key>` objects directly, without custom reference counting. Fixes pointfreeco#149. * Simplify ContinuationBox implementation * Update Reference.swift * Remove needless withExtendedLifetime and change guard to if/else --------- Co-authored-by: Stephen Celis <stephen.celis@gmail.com>
As described in #149, there can be a deadlock when accessing
@Sharedin aDependency. The issue still exists in the latest 2.5.0 release.The deadlock occurs when:
CachedValues.lockinDependencyValues.subcript.getterwhile initialising aliveValueinvolving a local@Sharedvariable which needs to lockPersistentReferences.lock,PersistentReferences.lockduring the call to_PersistentReference<Key>.initwhich ultimately attempts to lockCachedValues.lock.SharedContinuations.swift, which doesn't cause this deadlock but does not need to be held that long).This PR changes
swift-sharingto avoid holding locks of its own while accessing@Dependency, in order to keep the locking order the same:swift-sharinglocks them last (and releases first).At the same time, some of the bookkeeping and reference counting logic got simplified.