This directory contains implementations of various synchronization primitives for concurrent programming.
Motivated by
Linux Kernel MCS Spinlock implementation
File: mcs_spinlock.hpp
The MCS (Mellor-Crummey and Scott) spinlock is a scalable queue-based spinlock that provides FIFO ordering and excellent cache performance under high contention.
- FIFO Fairness — Threads acquire the lock strictly in the order they arrived, ensuring fairness and preventing starvation.
- Cache-Friendly Spinning — Each thread spins on a private, local flag in its own queue node, drastically reducing cache coherence traffic and eliminating false sharing.
- No Thundering Herd — Upon unlock, only the next waiting thread is notified, avoiding unnecessary wakeups and contention storms.
- Excellent Scalability — Scales efficiently under high contention, with performance that remains stable even as thread count increases.
- RAII-Based Guard — Lock acquisition and release are managed safely through a scope-based RAII wrapper, preventing accidental unlock omissions and simplifying usage.
- Deterministic Behavior — The lock queue ensures predictable acquisition order and timing characteristics, ideal for low-level concurrent systems.
MCS uses a linked queue of waiting threads:
- Enqueue Phase: Thread atomically adds itself to the tail of the queue
- Spin Phase: Thread spins on its own local
is_ownerflag - Dequeue Phase: Lock holder passes ownership to the next thread in the queue
This approach ensures each thread only spins on its own cache line, avoiding the cache coherence traffic that plagues naive spinlocks.
#include "thread/sync/mcs_spinlock.hpp"
thread::sync::QueueSpinLock spinlock;
int shared_data = 0;
{
thread::sync::QueueSpinLock::Guard guard{spinlock};
++shared_data;
}Good for:
- Short critical sections
- High thread contention scenarios
- When fairness (FIFO ordering) is important
Avoid when:
- Critical sections are long (use a mutex instead)
- Contention is very low (simpler spinlocks may be faster)
See examples/sync/mcs_example.cpp for a complete working example with benchmarking.
- Original Paper: "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors" by John M. Mellor-Crummey and Michael L. Scott (1991)
File: ticket_lock.hpp
Ticket Lock is a simple and fair spinlock that uses a ticket-based algorithm. Threads take a ticket and wait until their number is called, ensuring strict FIFO ordering.
- FIFO Fairness — Threads acquire the lock in the exact order they requested it, preventing starvation
- Simple Implementation — Uses only two atomic counters, making it easy to understand and verify
- Bounded Space — O(1) space per lock (no per-thread nodes needed)
- BasicLockable Concept — Implements standard
lock(),try_lock(), andunlock()interface - Predictable Behavior — Deterministic lock acquisition order
The lock maintains two atomic counters:
next_free_ticket_— Counter that dispenses tickets to arriving threadsowner_ticket_— Current ticket number that can acquire the lock
Algorithm:
- Thread atomically increments
next_free_ticket_and receives a ticket number - Thread spins until
owner_ticket_equals its ticket number - On unlock,
owner_ticket_is incremented, allowing the next thread to proceed
#include "thread/sync/ticket_lock.hpp"
thread::sync::TicketLock lock;
int shared_data = 0;
// With std::lock_guard
{
std::lock_guard<thread::sync::TicketLock> guard(lock);
++shared_data;
}
// Manual lock/unlock
lock.lock();
++shared_data;
lock.unlock();
// Try lock
if (lock.try_lock()) {
++shared_data;
lock.unlock();
}- Low Contention: Good performance, similar to simple spinlocks
- High Contention: Better than naive spinlocks but worse than MCS (all threads spin on same cache line)
- Fairness: Strict FIFO ordering prevents starvation
- Memory: O(1) per lock (2 atomic counters)
Good for:
- When strict FIFO fairness is required
- Simple use cases where code clarity is important
- Moderate contention scenarios
- When per-thread storage overhead is a concern
Avoid when:
- Very high contention (MCS scales better)
- Lock is held for long periods (use mutex instead)
- Priority inversion is a concern
See tests/sync/ticket_lock_test.cpp for comprehensive test suite.
File: ttas_spinlock.hpp
TTAS (Test-and-Test-and-Set) is an improved spinlock that reduces cache coherence traffic by performing a read-only check before attempting an atomic compare-exchange. This optimization significantly improves performance under contention compared to naive Test-and-Set (TAS) spinlocks.
- Reduced Cache Traffic — Read-only spinning reduces cache coherence messages
- Fast Uncontended Path — Single compare-exchange when lock is available
- Simple and Efficient — Minimal overhead with good performance characteristics
- BasicLockable Concept — Standard lock interface compatible with
std::lock_guard - No Fairness Guarantees — Lock acquisition order is non-deterministic
The algorithm uses two-phase acquisition:
- Test Phase: Read the lock state with
memory_order_relaxed(no cache line invalidation) - Test-and-Set Phase: When lock appears free, attempt atomic compare-exchange
This approach keeps the cache line in shared state during spinning, only generating invalidation traffic when the lock becomes available.
#include "thread/sync/ttas_spinlock.hpp"
thread::sync::TASSpinLock lock;
int shared_data = 0;
// With std::lock_guard
{
std::lock_guard<thread::sync::TASSpinLock> guard(lock);
++shared_data;
}
// Manual lock/unlock
lock.lock();
++shared_data;
lock.unlock();
// Try lock
if (lock.try_lock()) {
++shared_data;
lock.unlock();
}- Low Contention: Excellent performance, similar to native atomic operations
- Medium Contention: Good performance due to reduced cache traffic
- High Contention: Degrades under extreme contention (thundering herd on unlock)
- Fairness: No fairness guarantees, potential for starvation
- Memory: O(1) per lock (single atomic bool)
Good for:
- Low to medium contention scenarios
- Very short critical sections
- When lock overhead must be minimal
- When fairness is not required
Avoid when:
- Fairness is important (use Ticket or MCS lock)
- High contention expected (use MCS lock)
- Risk of starvation is unacceptable
- Critical sections are long (use mutex)
See tests/sync/ttas_spinlock_test.cpp for comprehensive test suite.
File: mutex.hpp
The Three-State Mutex is an efficient mutex implementation that combines fast userspace operations with kernel-level blocking. It uses a three-state design to optimize for the common case where locks are uncontended, avoiding expensive system calls on both lock and unlock operations.
- Fast Path (No Syscalls) — Lock and unlock operations complete in userspace without kernel involvement when uncontended
- Three-State Design — Tracks whether the lock has waiting threads to minimize futex operations
- Futex-Based Blocking — Uses Linux futex for efficient kernel-level blocking instead of busy-waiting
The mutex uses three distinct states to optimize performance:
Unlocked(0) — Mutex is free and available for acquisitionLockedNoWaiters(1) — Mutex is held, but no other threads are waitingLockedHasWaiters(2) — Mutex is held, and at least one thread is blocked waiting
This state tracking allows the unlock operation to know whether it needs to wake waiting threads.
Slow Path (Contention):
- Thread transitions state from
LockedNoWaiters→LockedHasWaiters - Thread calls
futex_wait()to block in kernel until lock is released - On unlock, holder sets state to
Unlockedand callsfutex_wake()to wake waiters - Woken threads compete to acquire the now-free lock
- Uncontended: Extremely fast, single atomic CAS operation (comparable to spinlock)
- Low Contention: Good performance, threads block briefly then acquire lock
- High Contention: Excellent performance, threads sleep rather than spinning
- Long Critical Sections: Ideal use case, blocked threads don't waste CPU
- Fairness: No strict fairness, kernel scheduler determines wakeup order
| Feature | Three-State Mutex | MCS | Ticket Lock | TTAS |
|---|---|---|---|---|
| Fairness | ✅ FIFO | ✅ FIFO | ❌ None | |
| Scalability | ✅ Excellent | ✅ Excellent | ||
| Memory/Lock | O(1) | O(1) + O(1)/thread | O(1) | O(1) |
| Complexity | Medium | High | Low | Low |
| Blocking | ✅ Kernel sleep | ❌ Busy-wait | ❌ Busy-wait | ❌ Busy-wait |
| CPU Efficiency | ✅ No waste | |||
| Cache Traffic | High | Minimal | High | Medium |
Planned synchronization primitives:
- CLH Spinlock - Similar to MCS but using implicit queue
- Reader-Writer Locks - Allow multiple readers or single writer
- Seqlock - Optimistic read lock for data structures
- Condition Variable - Wait/notify coordination with mutex
- Semaphore - Counting synchronization primitive