Skip to content

Commit cc9b576

Browse files
committed
Eliminate priority inversions in the metadata completion runtime.
The current system is based on MetadataCompletionQueueEntry objects which are allocated and then enqueued on dependencies. Blocking is achieved using a condition variable associated with the lock on the appropriate metadata cache. Condition variables are inherently susceptible to priority inversions because the waiting threads have no dynamic knowledge of which thread will notify the condition. In the current system, threads that unblock dependencies synchronously advance their dependent metadata completions, which means the signaling thread is unreliable even if we could represent it in condition variables. As a result, the current system is wholly unsuited for eliminating these priority inversions. An AtomicWaitQueue is an object containing a lock. The queue is eagerly allocated, and the lock is held, whenever a thread is doing work that other threads might wish to block on. In the metadata completion system, this means whenever we construct a metadata cache entry and the metadata isn't already allocated and transitively complete after said construction. Blocking is done by safely acquiring a shared reference to the queue object (which, in the current implementation, requires briefly taking a lock that's global to the surrounding metadata cache) and then acquiring the contained lock. For typical lock implementations, this avoids priority inversions by temporarily propagating the priority of waiting threads to the locking threads. Dependencies are unblocked by simply releasing the lock held in the queue. The unblocking thread doesn't know exactly what metadata are blocked on it and doesn't make any effort to directly advance their completion; instead, the blocking thread will wake up and then attempt to advance the dependent metadata completion itself, eliminating a source of priority overhang that affected the old system. Successive rounds of unblocking (e.g. when a metadata makes partial progress but isn't yet complete) can be achieved by creating a new queue and unlocking the old one. We can still record dependencies and use them to dynamically diagnose metadata cycles. The new system allocates more eagerly than the old one. Formerly, metadata completions which were never blocked never needed to allocate a MetadataCompletionQueueEntry; we were then unable to actually deallocate those entries once they were allocated. The new system will allocate a queue for most metadata completions, although, on the positive side, we can reliably deallocate these queues. Cache entries are also now slightly smaller because some of the excess storage for status has been folded into the queue. The fast path of an actual read of the metadata remains a simple load-acquire. Slow paths may require a bit more locking. On Darwin, the metadata cache lock can now use os_unfair_lock instead of pthread_mutex_t (which is a massive improvement) because it does not need to support associated condition variables. The excess locking could be eliminated with some sort of generational scheme. Sadly, those are not portable, and I didn't want to take it on up-front. rdar://76127798
1 parent 2bd9e6e commit cc9b576

File tree

3 files changed

+831
-907
lines changed

3 files changed

+831
-907
lines changed

0 commit comments

Comments
 (0)