HFTurbo

My attempt at a complete high-frequency trading (HFT) pipeline, from synthetic tick generation to order execution and trade publishing. It’s designed to demonstrate how networking, clock synchronization, and hardware limits affect end-to-end latency in distributed systems.

Built using C++, Go, and Python, all services communicate via ZeroMQ using PUB/SUB and PUSH/PULL patterns. The stack is fully containerized with Docker Compose and can scale under K8s. No specialized hardware was used in this demo (e.g., FPGAs, RDMA NICs, etc.), the idea was to explore what I could achieve with commodity hardware and software optimizations.

Summary of Communication Flow

Step	Service (From)	Service (To)	Protocol	Purpose
1️⃣	Synthetic Feeder	Market Data Ingestor	WebSocket (port 8765)	Stream synthetic market data
2️⃣	Market Data Ingestor	Trading Algorithm	ZeroMQ PUB/SUB (port 5555)	Broadcast tick data
3️⃣	Trading Algorithm	Matching Engine	ZeroMQ PUSH/PULL (port 5557)	Send generated orders based on algorithm
4️⃣	Matching Engine	Order Submission / Latency Harness	ZeroMQ PUB/SUB (port 5558)	Publish executed trades
5️⃣	Order Submission	Mock Exchange	HTTP POST (port 5678)	Send trade confirmations
6️⃣	Latency Harness	(Self-contained)	Local timing	Measure E2E pipeline latency

Feeder generates synthetic tick data (e.g., price = 100.4202132) every 1ms.
Ingestor reads the WebSocket and publishes the tick to all subscribers.
Algorithm consumes ticks, generates BUY/SELL orders, and sends them to the matching engine.
Matching Engine pairs bids/asks, timestamps the match, and publishes trade results.
Order Submission forwards matched trades to a mock exchange via HTTP, and the exchange responds with a confirmation (its just a 200 OK in this case, for simplicity).
Latency Harness subscribes to the port opened by the Order Submission service and measures latency across the pipeline.

Latency Measurements/Discussion

From this work, I was able to achieve an end-to-end latency of ~10 ms for ~100 trades (ran on my machine, an Asus VivoBook with a Ryzen 7 4700U and 16GB of RAM); from synthetic tick generation to order execution and trade publishing. This was measured using a simple latency harness Python script that utilized the timestamps associated with the orders, and calculated the total time taken for a message to traverse the system.

The main bottlenecks in the system from my perspective/research were:

Network latency: Even on a local network, the time taken for messages to travel between services contributed significantly. Using ZeroMQ helped reduce some of this overhead due to its efficient messaging patterns.
Containerization overhead: In real HFT environments, the network interface card (NIC) bypasses the kernel entirely using technologies like DPDK, RDMA, or Onload, allowing network packets to be DMA’d directly into user space. The NIC can provide hardware time-stamping with nanosecond accuracy, and the trading engine typically runs pinned to a CPU core, reading packets directly—enabling sub-microsecond tick-to-trade latency. Docker, or any virtualization layer, breaks this model because it adds network abstraction layers (e.g., bridges, veth pairs, etc...), preventing direct kernel bypass, and cannot expose NIC clock synchronization into containers. This explains why my containerized setup achieved latencies on the order of milliseconds rather than the sub-microsecond latencies seen in production HFT systems.
Processing overhead: Serialization, deserialization, and order book management all contributed minor additional delays, though these seemed to generally be dwarfed by network and containerization effects.

Potential Improvements

The following improvements that I'm considering to explore in future iterations:

Investigate the use of more efficient serialization formats (e.g., Protobufs, FlatBuffers) to reduce message size and parsing overhead.
This implementation assumes a single stock symbol, as it's generally common to maintain one order book per symbol. Expanding to multiple symbols would require more complex order book management and could introduce additional latency, but would make the system more realistic, so it's worth exploring.
I also want to explore more advanced trading algorithms, such as statistical arbitrage or machine learning-based strategies, to see how they perform in this low-latency environment, right now the algorithm is as trivial as it gets (BUY if current price < last price, SELL if current price > last price).

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
services		services
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HFTurbo

Summary of Communication Flow

Latency Measurements/Discussion

Potential Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HFTurbo

Summary of Communication Flow

Latency Measurements/Discussion

Potential Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages