DeepLOB+

A production-grade Limit Order Book (LOB) prediction and backtesting framework built with PyTorch.

⚠️ Note: This project is under active development. New models, features, and optimizations are being added regularly. Contributions and feedback are welcome!

What is DeepLOB+?

DeepLOB+ predicts short-term price movements from Limit Order Book data using deep learning. It extends the original DeepLOB architecture with multi-horizon forecasting, comprehensive benchmarking, and a full backtesting simulation engine.

LOB Snapshot:  [Ask₁..Ask₁₀ | Bid₁..Bid₁₀] → 40 features
                            ↓
                      [DeepLOB CNN+LSTM]
                            ↓
Prediction:          {↓ Down, → Stable, ↑ Up}
                            ↓
Backtester:     Execute trades with latency + VWAP

Architecture

┌───────────────────────────────────────────────────────────────────┐
│                         DeepLOB+                                  │
├───────────────────────────────────────────────────────────────────┤
│                                                                   │
│   ┌──────────────┐    ┌──────────────┐    ┌──────────────┐       │
│   │  LOBSTER     │───▶│  Normalizer  │───▶│   History    │       │
│   │  Streamer    │    │  (z-score)   │    │   Buffer     │       │
│   └──────────────┘    └──────────────┘    └──────┬───────┘       │
│                                                   │               │
│        ┌──────────────────────────────────────────┘               │
│        ▼                                                          │
│   ┌──────────────┐    ┌──────────────┐    ┌──────────────┐       │
│   │   DeepLOB    │───▶│  Scheduler   │───▶│   Latency    │       │
│   │   Model      │    │  (signals)   │    │   Queue      │       │
│   └──────────────┘    └──────────────┘    └──────┬───────┘       │
│                                                   │               │
│        ┌──────────────────────────────────────────┘               │
│        ▼                                                          │
│   ┌──────────────┐    ┌──────────────┐    ┌──────────────┐       │
│   │    VWAP      │───▶│   Trader     │───▶│   Logger     │       │
│   │   Executor   │    │  (P&L)       │    │  (results)   │       │
│   └──────────────┘    └──────────────┘    └──────────────┘       │
│                                                                   │
└───────────────────────────────────────────────────────────────────┘

Features

Component	Description
DeepLOB	CNN + LSTM architecture for LOB sequences
Multi-Horizon	Predictions at k ∈ {10, 20, 50, 100} events
Benchmarks	LogReg, XGBoost, LSTM baselines for comparison
Backtester	Event-driven simulator with latency + VWAP execution
TorchScript	Export models for C++ inference
LOBSTER	Native support for LOBSTER data format

Quick Start

1. Train a Model

# Train DeepLOB for horizon k=10 on TSLA
python multi_horizon/fast_cli.py --ticker TSLA --horizons 10 \
    --train-dates 2025-04-01 2025-04-02 \
    --val-dates 2025-04-03 \
    --test-dates 2025-04-04 \
    --epochs 5 --stride 50 --batch-size 2048

2. Run Benchmarks

# Compare DeepLOB against baselines
python benchmarks/cli.py \
    --data_dir data/processed \
    --symbol TSLA \
    --horizon 10 20 50 100 \
    --device cuda \
    --verbose

3. Backtest Strategy

# Simulate trading with trained model
python sim_v2/cli.py \
    --ticker AMZN \
    --date 2025-03-14 \
    --model TorchScriptModels/deeplob_AMZN_h10.pt \
    --threshold 0.7 \
    --latency 5

DeepLOB Model

The model processes 100-timestep windows of 40 LOB features (10 levels × 4 features):

Input: (batch, 1, 100, 40)
         ↓
┌────────────────────────────────────────┐
│  Conv Block 1: 32 filters, stride=(1,2)│  → Spatial compression
│  Conv Block 2: 32 filters, stride=(1,2)│  → Feature extraction
│  Conv Block 3: 32 filters              │  → Final spatial
└────────────────────────────────────────┘
         ↓
┌────────────────────────────────────────┐
│  Inception Module (parallel paths)     │
│  • 1×1 → 3×1 convolutions              │
│  • 1×1 → 5×1 convolutions              │
│  • MaxPool → 1×1 convolution           │
└────────────────────────────────────────┘
         ↓
┌────────────────────────────────────────┐
│  LSTM: 64 hidden units                 │  → Temporal dynamics
└────────────────────────────────────────┘
         ↓
Output: (batch, 3)  →  {Down, Stable, Up}

Benchmark Results

Models evaluated on identical train/val/test splits with consistent preprocessing:

Model	Accuracy	F1 (Macro)	MCC
DeepLOB	68.2%	0.67	0.52
XGBoost	61.4%	0.59	0.42
LSTM (2-layer)	58.7%	0.55	0.38
LogReg + PCA	54.2%	0.51	0.31

Project Structure

DeepLobPlus/
├── src/
│   ├── models/              # DeepLOB variants
│   │   ├── deeplob_light.py # Main model
│   │   ├── TLOB.py          # Transformer LOB
│   │   └── registry.py      # Model registry
│   ├── data_processing/     # LOBSTER preprocessing
│   └── pipeline/            # Training pipeline
├── multi_horizon/
│   ├── fast_cli.py          # 🚀 Optimized training
│   ├── evaluator.py         # Multi-horizon eval
│   └── dataset.py           # Horizon-specific labels
├── benchmarks/
│   ├── cli.py               # Benchmark runner
│   ├── models.py            # Baseline implementations
│   └── feature_engineering.py
├── sim_v2/
│   ├── backtester.py        # Backtest engine
│   ├── emulator.py          # LOB emulation
│   └── model_converter.py   # TorchScript export
├── pysim/
│   ├── simulator.py         # Main simulator
│   ├── core/                # Streamer, executor, etc.
│   └── schedulers/          # Signal generators
└── simulator/
    ├── src/                 # C++ inference engine
    └── include/             # Headers

Multi-Horizon Labels

Price movement labels generated with adaptive gamma thresholds:

                 m₋(t)        m₊(t)
    ─────────────┼─────────────┼─────────────
         ↓ Down  │   → Stable  │   ↑ Up
                 │             │
    m(t+k) < m(t) - γσ    m(t+k) > m(t) + γσ

Where:

m(t) = mid-price at time t
k = prediction horizon (events ahead)
γ = threshold multiplier (auto-fitted)
σ = price volatility estimate

Performance Optimization

For large datasets (2M+ samples):

Stride	Samples	Training Time	Use Case
1	2,400,000	10+ hours	❌ Too slow
50	48,000	20-30 min	✅ Production
100	24,000	5-10 min	✅ Fast testing

# 🚀 Fast testing (5 minutes)
python multi_horizon/fast_cli.py --ticker TSLA --horizons 10 \
    --epochs 2 --stride 100 --batch-size 2048

# 🏃 Full evaluation (30 minutes)
python multi_horizon/fast_cli.py --ticker TSLA --horizons 10 20 50 100 \
    --epochs 5 --stride 50 --batch-size 2048 --lr 1e-3

Backtesting Engine

The simulator executes trades with realistic market microstructure:

┌─────────────────────────────────────────────────────────┐
│                    Execution Flow                       │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Signal → Latency Queue → VWAP Walk → Portfolio Update  │
│     │          │              │              │          │
│     │     (N steps)     (consume LOB)   (track P&L)     │
│     │                                                   │
│  Conf > θ?                                              │
│  Pos < max?                                             │
│                                                         │
└─────────────────────────────────────────────────────────┘

Features:

Configurable execution latency (in LOB updates)
VWAP execution walking through book levels
Position limits and cash tracking
Trade logging with timestamps

Handcrafted Features (Baselines)

The FeatureEngineer extracts 22 interpretable features:

Category	Features
Current State	Spread, mid-price, level 1 sizes, imbalance
Statistical	Mean, std, min, max, median over window
Trend	Returns at 1, 5, 10, 20 step horizons
Volatility	Rolling vol at 5, 10, 20 step windows
Order Imbalance	Total and per-level imbalance ratios

Inspired By

DeepLOB - Original architecture
LOBSTER - High-frequency LOB data
HLOB - Heterogeneous LOB research

Roadmap

DeepLOB CNN+LSTM implementation
Multi-horizon training & evaluation
Benchmark framework (LogReg, XGBoost, LSTM)
Event-driven backtester with latency
TorchScript model export

License

MIT License - see LICENSE for details.

Author

Angelo - GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepLOB+

What is DeepLOB+?

Architecture

Features

Quick Start

1. Train a Model

2. Run Benchmarks

3. Backtest Strategy

DeepLOB Model

Benchmark Results

Project Structure

Multi-Horizon Labels

Performance Optimization

Backtesting Engine

Handcrafted Features (Baselines)

Inspired By

Roadmap

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmarks		benchmarks
multi_horizon		multi_horizon
pysim		pysim
sim_v2		sim_v2
simulator		simulator
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

DeepLOB+

What is DeepLOB+?

Architecture

Features

Quick Start

1. Train a Model

2. Run Benchmarks

3. Backtest Strategy

DeepLOB Model

Benchmark Results

Project Structure

Multi-Horizon Labels

Performance Optimization

Backtesting Engine

Handcrafted Features (Baselines)

Inspired By

Roadmap

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages