Skip to content

Fix deadlock when accessing @Shared in a Dependency#153

Merged
stephencelis merged 5 commits intopointfreeco:mainfrom
pyrtsa:shared-dependency-deadlock
May 8, 2025
Merged

Fix deadlock when accessing @Shared in a Dependency#153
stephencelis merged 5 commits intopointfreeco:mainfrom
pyrtsa:shared-dependency-deadlock

Conversation

@pyrtsa
Copy link
Contributor

@pyrtsa pyrtsa commented May 5, 2025

As described in #149, there can be a deadlock when accessing @Shared in a Dependency. The issue still exists in the latest 2.5.0 release.

The deadlock occurs when:

  • one thread (Thread 1) is holding CachedValues.lock in DependencyValues.subcript.getter while initialising a liveValue involving a local @Shared variable which needs to lock PersistentReferences.lock ,
  • while another thread (Thread 7) is doing it the other way, holding PersistentReferences.lock during the call to _PersistentReference<Key>.init which ultimately attempts to lock CachedValues.lock.
  • (there's also some more locking involved in SharedContinuations.swift, which doesn't cause this deadlock but does not need to be held that long).
Thread 1 Thread 7
Thread 1 Thread 7

This PR changes swift-sharing to avoid holding locks of its own while accessing @Dependency, in order to keep the locking order the same: swift-sharing locks them last (and releases first).

At the same time, some of the bookkeeping and reference counting logic got simplified.

}
}

final class _ManagedReference<Key: SharedReaderKey>: Reference, Observable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pyrtsa Thanks for the PR and for looking into this! It's been a little while since we worked on this, but I believe our worry at the time was that two shared references could compete and initialize the same persistent reference at the same time, and that they would hold unique references while only a single one raced to live in the global persistent references store.

When looking into this did you conclude that this is not possible due to the existing locking?

I also wonder if it's possible to cook up a test case that deadlocks on main and is fixed in this branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote it so that while two or more threads can indeed start setting up the shared reference, only one of them will succeed (while holding the lock another time), and all others will simply drop their half-initialised reference and grab the one already there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think a bit on the failing/fixing unit test case, maybe it's possible with some artificial delays. 🤔

Copy link
Contributor Author

@pyrtsa pyrtsa May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a unit test now.

I tried to make the CI run the commit with just the failing test on top of main, but GHA got stuck somehow. (Edit: Maybe the GHA macOS runner breaks down with this test; see #157?)

Anyway, you can check out d188bfb and run tests manually to see that it indeed deadlocks.

@pyrtsa pyrtsa force-pushed the shared-dependency-deadlock branch 2 times, most recently from 0e9f40f to be056e0 Compare May 6, 2025 14:02
pyrtsa added 3 commits May 6, 2025 17:04
...by not holding locks while accessing `Dependency` logic with locks of its own.

Also simplify `PersistentReferences` implementation by maintaining weak references to
`_PersistentReference<Key>` objects directly, without custom reference counting.

Fixes pointfreeco#149.
Copy link
Member

@stephencelis stephencelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again!

) rethrows -> _PersistentReference<Key> {
guard let reference = lock.withLock({ (storage[key.id] as? Weak<Key>)?.reference }) else {
let value = try value()
return withExtendedLifetime(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pyrtsa Can you explain why you added this? The tests seem to pass without it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think I was being overly pedantic there, and it can be simplified down to:

      let reference = _PersistentReference(
        key: key,
        value: value,
        skipInitialLoad: skipInitialLoad
      )
      return lock.withLock {
        if let reference = (storage[key.id] as? Weak<Key>)?.reference {
          return reference
        } else {
          storage[key.id] = Weak(reference: reference)
          reference.onDeinit = { [self] in
            removeReference(forKey: key)
          }
          return reference
        }
      }

I originally added withExtendedLifetime in order to guarantee that the captured local _PersistentReference instance outlives the lock no matter what, because before this PR _PersistentReference used to access @Dependency(PersistentReferences.self) in its deinit, potentially causing the same deadlock (in even more subtle circumstances).

I wasn't sure if there was a possibility some other code path could access another @Dependency somewhere else. Reading code closer now, _PersistentReference still has quite a few members which has or could indirectly have a non-trivial deinit (_$perceptionRegistrar, key, subject, _loadError, _saveError, subscription at least), but neither seems currently likely to use @Dependency in their deinit.

But even if that was the case, it seems the current Swift compiler retains the captured-but-unused local variable reference longer than the capturing closure passed to lock.withLock { ... } even without explicitly extending its lifetime. So I guess we can indeed simplify it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that, I think it would be good to next check whether swift-dependencies could be updated not to hold the CachedValues.lock while running user-defined code. That would also remove many concerns and potential future regressions causing deadlocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed out a fix, and also moved the happy path (just return reference) to the top with an if/else, as the guard/else seemed to make the logic harder to follow with such a long else branch.

@stephencelis
Copy link
Member

Thanks again for investigating this and getting a fix together!

@stephencelis stephencelis merged commit 8f8fef0 into pointfreeco:main May 8, 2025
6 checks passed
@stephencelis
Copy link
Member

@pyrtsa Sadly it does look like the deadlock test is flakey: https://github.com/pointfreeco/swift-sharing/actions/runs/14910678350/job/41883976560

I'm unable to make it fail locally, so it may be resource constrained environments like GitHub Actions, but I was wondering if you had any other ideas?

@pyrtsa
Copy link
Contributor Author

pyrtsa commented May 8, 2025

@pyrtsa Sadly it does look like the deadlock test is flakey: https://github.com/pointfreeco/swift-sharing/actions/runs/14910678350/job/41883976560

I'm unable to make it fail locally, so it may be resource constrained environments like GitHub Actions, but I was wondering if you had any other ideas?

Let's just make the delays a bit longer. They're in the deciseconds currently, maybe the deadlock detector could wait 1–2 secs instead?

@stephencelis
Copy link
Member

@pyrtsa Sounds good to me! Have time for a quick PR?

@pyrtsa
Copy link
Contributor Author

pyrtsa commented May 8, 2025

Can look into it in a couple of hours I think.

@pyrtsa
Copy link
Contributor Author

pyrtsa commented May 8, 2025

@stephencelis Could you take a look if #159 would work, maybe?

If not, we might need to tweak the deadlock test to run in a suite or even test target of its own, completely isolated from any other parallel tests limiting its concurrency.

mhayes853 pushed a commit to mhayes853/swift-sharing that referenced this pull request Aug 12, 2025
)

* Add failing test for persistence locking order w.r.t. Dependencies

* Avoid deadlock when accessing `@Shared` in a `@Dependency`

...by not holding locks while accessing `Dependency` logic with locks of its own.

Also simplify `PersistentReferences` implementation by maintaining weak references to
`_PersistentReference<Key>` objects directly, without custom reference counting.

Fixes pointfreeco#149.

* Simplify ContinuationBox implementation

* Update Reference.swift

* Remove needless withExtendedLifetime and change guard to if/else

---------

Co-authored-by: Stephen Celis <stephen.celis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants