hyprTrader is a high-frequency trading (HFT) simulation engine built in C++20. It implements a Limit Order Book (LOB) with Price-Time Priority matching logic, optimized for sub-microsecond latency on commodity hardware.
The system bypasses standard library overheads by utilizing custom memory arenas, lock-free concurrency patterns, and cache-friendly data structures to minimize Tick-to-Trade latency.
- Custom Memory Arena: Implemented a slab allocator (
MemoryPool) to eliminatemalloc/freeoverhead on the hot path, ensuring O(1) allocation/deallocation and preventing memory fragmentation. - Lock-Free Concurrency: Utilizes a Single-Producer Single-Consumer (SPSC) ring buffer using
std::atomicwith explicit memory ordering (acquire/release) for wait-free thread communication. - Cache Locality: Order Book data structures are designed with contiguous memory layouts (flat arrays over node-based trees) to minimize CPU cache misses.
- Zero-Copy Architecture: Pointers/Indices are passed through the pipeline rather than copying full Order objects.
- Determinism: Engine logic is separated from IO to allow for deterministic replay of market data (PCAP) for backtesting.
The system separates the Network/Feed Handler (Producer) from the Matching Engine (Consumer) to ensure the core logic never blocks on I/O.
graph LR
%% Nodes
A[Market Data Feed] -->|TCP/UDP| B(Gateway / Feed Handler)
B -->|Order Object| C{Lock-Free Ring Buffer}
C -->|Pop| D[Matching Engine Core]
D -->|Match| E[(Limit Order Book)]
E -->|Execution Report| F[Trade Log / Gateway]
%% Styling - Professional "Systems" Palette
classDef hardware fill:#f5f5f5,stroke:#333,stroke-width:2px,color:#333;
classDef logic fill:#d4e1f5,stroke:#333,stroke-width:2px,color:#333;
classDef storage fill:#fff9c4,stroke:#d4a017,stroke-width:2px,color:#333;
classDef critical fill:#ffccbc,stroke:#bf360c,stroke-width:2px,color:#333;
%% Apply Styles
class A,F hardware;
class B,D logic;
class E storage;
%% Specific Style for the Buffer (The "Hot Path")
style C fill:#b2dfdb,stroke:#00695c,stroke-width:2px,color:#004d40
Instead of using std::map (Red-Black Tree) which scatters nodes in heap memory, hyprTrader uses pre-allocated vectors and intrusive linked lists.
classDiagram
direction TB
%% Styling
classDef mainClass fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000;
classDef subClass fill:#ffffff,stroke:#333,stroke-width:1px,color:#000;
class MemoryPool {
+vector~Order~ store
+vector~size_t~ free_indices
+allocate() Order*
+deallocate(Order*)
}
class Order {
+uint32_t id
+uint32_t price
+uint32_t next_order_index
+uint32_t prev_order_index
}
%% Apply Styles
class MemoryPool mainClass
class Order subClass
MemoryPool "1" *-- "100,000" Order : Pre-Allocated Contiguous Block
Hardware: Fedora Linux (WSL2), GCC 15.x, Intel/AMD Host CPU
Comparison between standard new/delete and hyprTrader::MemoryPool.
| Allocator Type | Operations | Avg Latency (ns) | 99th % Latency (ns) |
|---|---|---|---|
| std::allocator (Heap) | 1,000,000 | ~120 ns | ~450 ns |
| Custom MemoryPool | 1,000,000 | ~4 ns | ~6 ns |
Time taken from receiving an order to generating a fill (match).
| Scenario | Throughput (msgs/sec) | Avg Latency (µs) |
|---|---|---|
| Add Order (No Match) | 2.5 Million | 0.35 µs |
| Match Execution | 1.8 Million | 0.85 µs |
> Note: Benchmarks run using Google Benchmark framework with CPU isolation.
- Linux Environment (Fedora/Ubuntu)
- C++20 Compiler (GCC 11+ or Clang 14+)
- CMake 3.15+
# Clone the repository
git clone git@github.com:0xSaksham/hyprTrader.git
cd hyprTrader
# Create build directory
mkdir build && cd build
# Configure and Build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)# Run Unit Tests (GTest)
./tests/hyprTrader_tests
# Run Micro-benchmarks
./benchmarks/hyprTrader_benchUsed std::span for safe buffer views and concepts for constraining template types in the Memory Pool, enforcing type safety at compile time without runtime penalties.
A Multi-Producer queue would require CAS (Compare-And-Swap) loops or heavy locking. Since we have a dedicated Gateway thread, an SPSC queue allows us to use simple atomic loads/stores, which is the fastest possible thread sync mechanism.
Standard std::list performs a specialized allocation for every node. By embedding next/prev indices directly into the Order struct, we keep the data cache-hot and avoid pointer chasing across the heap.
- Core Order Book Data Structures
- Custom Arena Allocator
- ITCH 5.0 Market Data Parser
- TCP Order Entry Gateway
- Docker containerization for deployment
Saksham Gupta Aspiring Low-Latency Systems Engineer GitHub | LinkedIn | Blog