A low-latency C++ orderbook implementation achieving low latency through lock-free data structures and custom memory management.
This project implements a financial orderbook that matches buy and sell orders with extreme low latency. The design evolved from a simple thread-safe implementation to a high-performance lock-free architecture suitable for HFT (High-Frequency Trading) systems.
The initial implementation used straightforward C++ containers and mutex-based thread safety:
Characteristics:
std::mapfor price levels (O(log n) lookup)std::unordered_mapfor order lookupstd::shared_ptr<Order>for order managementstd::list<OrderPointer>for orders at each price levelstd::mutexfor thread safety- Separate background thread for order pruning
Limitations:
- Lock contention under load
- Cache misses from pointer chasing
- Dynamic allocations during trading
- Poor scalability with multiple threads
Completely redesigned for multi-threaded environments:
Key Improvements:
- Removed all
std::mutexinstances - Atomic operations for thread coordination
- Single-writer principle for critical paths
// Before: std::map<Price, OrderPointers>
// After: std::array<PriceLevel, NUM_LEVELS>
static constexpr Price MIN_PRICE = 0.0;
static constexpr Price MAX_PRICE = 500.0;
static constexpr Price TICK_SIZE = 0.01;
static constexpr size_t NUM_LEVELS = 50000;O(1) direct access vs O(log n) tree traversal
Cache-local memory layout
Predictable performance
- Custom Memory Pool
struct Block {
LockFreeOrder orders[1024]; // Pre-allocated
Block* next;
};Zero allocations during trading
Contiguous memory layout
20x faster order creation
- Lock-Free Order Class
class alignas(64) LockFreeOrder {
const OrderId orderId_;
std::atomic<Quantity> remaining_;
std::atomic<LockFreeOrder*> next_;
};- Ring Buffer for Order Submission
template<typename T, size_t Capacity>
class LockFreeRingBuffer {
std::array<T, Capacity> buffer_;
std::atomic<size_t> head_;
std::atomic<size_t> tail_;
// Lock-free producer/consumer
};Decouples order submission from processing
No blocking between threads
Batch processing capability