-
Notifications
You must be signed in to change notification settings - Fork 29
Add non-Arc-clone'ing variant of access() #1003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| &'a self, | ||
| tree_ref: &'a TreeBackref<T>, | ||
| ) -> Result<MutexGuard<'a, Tree<T>>, TreeBackref<T>> { | ||
| fn try_lock_tree<'node, 'guard>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this works as <'a>, but I had a hard time reasoning about how long the guards are held until I'd given them distinct names.
|
This partially unwinds #678 (and in fact the comment on As I was writing this, I didn't see a way this could introduce novel locking issues except by |
`access()` implies (typically) an `Arc::clone` of the guarded resource. When that resource is access()'d frequently, especially across many cores, the contention for incrementing and decrementing the shared refcount can become the primary bottleneck for Propolis. This is particularly visible in `nvme`, where queues live in guest memory, are mediated by `MemCtx`, queues are managed across dozens of threads (hundreds in some cases!) threads, and millions of items are read and written to queues per second. On large processors an increment can take hundreds of nanoseconds, or thousands of cycles! ---- This patch is a less comprehensive improvement to the problem than I'd hoped for, but it *is* a substantial improvement enough to include for now. The first try at this change set out to add an alternative to accessors::Guard, call it accessor::View, which would have a refresh() analogous to Guard::access(). View was expected to hold a reference on a Node, and View::refresh() was expected to go through the same "check res_leaf or else re-derive the leaf resource" steps as guard(). After getting through the lock_tree() changes to keep a guard for a node when refreshing the leaf, it became clear in most cases we'd have an accessor::View we would *not* have that view via mutable reference. accessor::View's cached reference would need to be behind a Mutex for interior mutability, at which point a "View" is not much different from any other "Node" in the accessor tree. So, instead, this implements a less aspirational change of adding an "access() but return the guard instead of clone the reference". This is strictly less work than before, but still implies going out to `mutex_lock` and such for something that, in the happy case, should be nothing more than a load. Even so, the mutexes in question are per-Node and generally not contended - this is hard to spot even under load. ---- In the limit, an accessor tree that supports disabling access to subtrees already is up against the issue that `access()`'d `Guard` don't synchronize with operations on the resource. A PCI bus can "deny" access to memory, but a device or its queues may still have a Guard permitting to access memory after the fact and do so. It seems theoretically possible to lean into accessed resources being an `Arc<T>` and have nodes hold an `AtomicUsize` of that pointer, updating it as appropriate while managing the tree. Then, nodes and the tree generally could have generation counters where on access the caller must only compare generations to check that the cached access is still valid. To deny access to a (sub)tree, relevant `Node` could be modified, the tree's generation bumped, and the access-denier can wait on the reference count to the old tree to drop to zero. Critically, comparing the generation numbers requires only loads, so cores don't fight over writes, and in the typical case the generation number won't change, so a view's reference can remain counted until the generation number change is finally witnessed. I *think* that the main difficulty with this approach is that it requires components that hold a long-lived reference on the resource to regularly "check in" that the reference is still valid, but it would be very easy to accidentally sleep while holding a reference. This is the same general problem as blocking a writer becuase you've slept holding a read lock.
72a2f97 to
66a6cec
Compare
access()implies (typically) anArc::cloneof the guarded resource. When that resource is access()'d frequently, especially across many cores, the contention for incrementing and decrementing the shared refcount can become the primary bottleneck for Propolis.This is particularly visible in
nvmewhere queues live in guest memory, are mediated byMemCtx, queues are managed across dozens of threads (hundreds in some cases!) threads, and millions of items are read and written to queues per second. On large processors an increment on refcount forArc<MemCtx>can take microseconds(!)This patch is a less comprehensive improvement to the problem than I'd hoped for, but it is a substantial improvement enough to include for now.
The first try at this change set out to add an alternative to accessors::Guard, call it accessor::View, which would have a refresh() analogous to Guard::access(). View was expected to hold a reference on a Node, and View::refresh() was
expected to go through the same "check res_leaf or else re-derive the leaf resource" steps as guard().
After getting through the lock_tree() changes to keep a guard for a node when refreshing the leaf, it became clear in most cases we'd have an accessor::View we would not have that view via mutable reference.
accessor::View's cached reference would need to be behind a Mutex for interior mutability, at which point a "View" is not much different from any other "Node" in the accessor tree.
So, instead, this implements a less aspirational change of adding an "access() but return the guard instead of clone the reference". This is strictly less work than before, but still implies going out to
mutex_lockand such for something that, in the happy case, should be nothing more than a load. Even so, the mutexes in question are per-Node and generally not contended - this is hard to spot even under load.In the limit, an accessor tree that supports disabling access to subtrees already is up against the issue that
access()'dGuarddon't synchronize with operations on the resource. A PCI bus can "deny" access to memory, but a device or its queues may still have a Guard permitting to access memory after the fact and do so.It seems theoretically possible to lean into accessed resources being an
Arc<T>and have nodes hold anAtomicUsizeof that pointer, updating it as appropriate while managing the tree. Then, nodes and the tree generally could have generation counters where on access the caller must only compare generations to check that the cached access is still valid. To deny access to a (sub)tree, relevantNodecould be modified, the tree's generation bumped, and the access-denier can wait on the reference count to the old tree to drop to zero.Critically, comparing the generation numbers requires only loads, so cores don't fight over writes, and in the typical case the generation number won't change, so a view's reference can remain counted until the generation number change is finally witnessed.
I think that the main difficulty with this approach is that it requires components that hold a long-lived reference on the resource to regularly "check in" that the reference is still valid, but it would be very easy to accidentally sleep while holding a reference. This is the same general problem as blocking a writer because you've slept holding a read lock.
I didn't get to fully flushing that out here only due to time constraints; I don't see a reason it won't work (yet?).