-
Notifications
You must be signed in to change notification settings - Fork 29
avoid excessive Arc<Mapping> contention #1012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d47fafd to
1d32c1c
Compare
| _mem: &'a MemCtx, | ||
| base: &'_ Arc<Mapping>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MemCtx was here for the lifetime constraint, but Mapping is the backing structure that a SubMapping's lifetime "should" be derived from. MemCtx is just a container for all the mappings of a guest. taking &Mapping directly means we're constraining callers of SubMapping::new_base a bit more than they theoretically could have been.
if we ever care about this, the caller can Arc::clone and call new_base freely, 'cause it's pretty close to free now.
1d32c1c to
4cd56a3
Compare
| let mut sro = ReadOp::from_buf(0, &mut scratch); | ||
|
|
||
| f(®.id, RWOp::Read(&mut sro)); | ||
| drop(sro); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clippy complains about this now, which was surprising. I think what happened is, Backing used to have an implicit Drop because of Arc, no longer does, and the implicit Drop had made this ineligible for the lint before. this used to be necessary because the lifetime on ReadOp would have scratch held for mutable access when we get to write_bytes below. the lint is, then, not surprising in its relevance:
call to
std::mem::drpowitha value that doe not implementDrop. Dropping such a type only extends its contained lifetimes.
Conceptually it wasn't wrong for submappings to `Arc::clone` the pointed-to base mapping, but in practice all the accessors off of MemCtx return SubMapping with lifetimes constrained by that MemCtx. The sheer number of submappings being created and destroyed during I/O operations makes the refcount very hot under load. Lean into the de facto lifetime constraint on SubMapping and just take a borrow of the backing Mapping instead of Arc::clone (and implicit ref decrement in later drop).
4cd56a3 to
f9afddb
Compare
Conceptually it wasn't wrong for submappings to
Arc::clonethe pointed-to base mapping, but in practice all the accessors off of MemCtx return SubMapping with lifetimes constrained by that MemCtx.The sheer number of submappings being created and destroyed during I/O operations makes the refcount very hot under load. Lean into the de facto lifetime constraint on SubMapping and just take a borrow of the backing Mapping instead of Arc::clone (and implicit ref decrement in later drop).
this is actually one of the first bottlenecks that @jmpesp and I had noticed in November with noop-y backends but I'd incorrectly bundled this into an interrupt coalescing commit. then forgot about it between then and now, so I overlooked it Monday when getting in-flight stuff buttoned up. with the other changes we've gotten in, this a pretty substantial bottleneck: a sequential write
fiowithout this saw 38% of total CPU time in user code on the host. not inmemcpy, all inlock incq/lock decq.that extreme case come from us going through each PRP list entry, which cause a few
SubMapping::from_baseper. so the larger-but-fewer I/Os of large writes actually exercised this Arc worst of all benchmarks! with this change it's now 2% of total CPU time in user code and more than double the write bandwidth. seq-read, rand-read, rand-write are similar stories but less in magnitude.