Skip to content

angelorosu/deeplobplus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepLOB+

A production-grade Limit Order Book (LOB) prediction and backtesting framework built with PyTorch.

⚠️ Note: This project is under active development. New models, features, and optimizations are being added regularly. Contributions and feedback are welcome!

Python 3.10+ PyTorch MIT CUDA

What is DeepLOB+?

DeepLOB+ predicts short-term price movements from Limit Order Book data using deep learning. It extends the original DeepLOB architecture with multi-horizon forecasting, comprehensive benchmarking, and a full backtesting simulation engine.

LOB Snapshot:  [Ask₁..Ask₁₀ | Bid₁..Bid₁₀] → 40 features
                            ↓
                      [DeepLOB CNN+LSTM]
                            ↓
Prediction:          {↓ Down, → Stable, ↑ Up}
                            ↓
Backtester:     Execute trades with latency + VWAP

Architecture

┌───────────────────────────────────────────────────────────────────┐
│                         DeepLOB+                                  │
├───────────────────────────────────────────────────────────────────┤
│                                                                   │
│   ┌──────────────┐    ┌──────────────┐    ┌──────────────┐       │
│   │  LOBSTER     │───▶│  Normalizer  │───▶│   History    │       │
│   │  Streamer    │    │  (z-score)   │    │   Buffer     │       │
│   └──────────────┘    └──────────────┘    └──────┬───────┘       │
│                                                   │               │
│        ┌──────────────────────────────────────────┘               │
│        ▼                                                          │
│   ┌──────────────┐    ┌──────────────┐    ┌──────────────┐       │
│   │   DeepLOB    │───▶│  Scheduler   │───▶│   Latency    │       │
│   │   Model      │    │  (signals)   │    │   Queue      │       │
│   └──────────────┘    └──────────────┘    └──────┬───────┘       │
│                                                   │               │
│        ┌──────────────────────────────────────────┘               │
│        ▼                                                          │
│   ┌──────────────┐    ┌──────────────┐    ┌──────────────┐       │
│   │    VWAP      │───▶│   Trader     │───▶│   Logger     │       │
│   │   Executor   │    │  (P&L)       │    │  (results)   │       │
│   └──────────────┘    └──────────────┘    └──────────────┘       │
│                                                                   │
└───────────────────────────────────────────────────────────────────┘

Features

Component Description
DeepLOB CNN + LSTM architecture for LOB sequences
Multi-Horizon Predictions at k ∈ {10, 20, 50, 100} events
Benchmarks LogReg, XGBoost, LSTM baselines for comparison
Backtester Event-driven simulator with latency + VWAP execution
TorchScript Export models for C++ inference
LOBSTER Native support for LOBSTER data format

Quick Start

1. Train a Model

# Train DeepLOB for horizon k=10 on TSLA
python multi_horizon/fast_cli.py --ticker TSLA --horizons 10 \
    --train-dates 2025-04-01 2025-04-02 \
    --val-dates 2025-04-03 \
    --test-dates 2025-04-04 \
    --epochs 5 --stride 50 --batch-size 2048

2. Run Benchmarks

# Compare DeepLOB against baselines
python benchmarks/cli.py \
    --data_dir data/processed \
    --symbol TSLA \
    --horizon 10 20 50 100 \
    --device cuda \
    --verbose

3. Backtest Strategy

# Simulate trading with trained model
python sim_v2/cli.py \
    --ticker AMZN \
    --date 2025-03-14 \
    --model TorchScriptModels/deeplob_AMZN_h10.pt \
    --threshold 0.7 \
    --latency 5

DeepLOB Model

The model processes 100-timestep windows of 40 LOB features (10 levels × 4 features):

Input: (batch, 1, 100, 40)
         ↓
┌────────────────────────────────────────┐
│  Conv Block 1: 32 filters, stride=(1,2)│  → Spatial compression
│  Conv Block 2: 32 filters, stride=(1,2)│  → Feature extraction
│  Conv Block 3: 32 filters              │  → Final spatial
└────────────────────────────────────────┘
         ↓
┌────────────────────────────────────────┐
│  Inception Module (parallel paths)     │
│  • 1×1 → 3×1 convolutions              │
│  • 1×1 → 5×1 convolutions              │
│  • MaxPool → 1×1 convolution           │
└────────────────────────────────────────┘
         ↓
┌────────────────────────────────────────┐
│  LSTM: 64 hidden units                 │  → Temporal dynamics
└────────────────────────────────────────┘
         ↓
Output: (batch, 3)  →  {Down, Stable, Up}

Benchmark Results

Models evaluated on identical train/val/test splits with consistent preprocessing:

Model Accuracy F1 (Macro) MCC
DeepLOB 68.2% 0.67 0.52
XGBoost 61.4% 0.59 0.42
LSTM (2-layer) 58.7% 0.55 0.38
LogReg + PCA 54.2% 0.51 0.31

Project Structure

DeepLobPlus/
├── src/
│   ├── models/              # DeepLOB variants
│   │   ├── deeplob_light.py # Main model
│   │   ├── TLOB.py          # Transformer LOB
│   │   └── registry.py      # Model registry
│   ├── data_processing/     # LOBSTER preprocessing
│   └── pipeline/            # Training pipeline
├── multi_horizon/
│   ├── fast_cli.py          # 🚀 Optimized training
│   ├── evaluator.py         # Multi-horizon eval
│   └── dataset.py           # Horizon-specific labels
├── benchmarks/
│   ├── cli.py               # Benchmark runner
│   ├── models.py            # Baseline implementations
│   └── feature_engineering.py
├── sim_v2/
│   ├── backtester.py        # Backtest engine
│   ├── emulator.py          # LOB emulation
│   └── model_converter.py   # TorchScript export
├── pysim/
│   ├── simulator.py         # Main simulator
│   ├── core/                # Streamer, executor, etc.
│   └── schedulers/          # Signal generators
└── simulator/
    ├── src/                 # C++ inference engine
    └── include/             # Headers

Multi-Horizon Labels

Price movement labels generated with adaptive gamma thresholds:

                 m₋(t)        m₊(t)
    ─────────────┼─────────────┼─────────────
         ↓ Down  │   → Stable  │   ↑ Up
                 │             │
    m(t+k) < m(t) - γσ    m(t+k) > m(t) + γσ

Where:

  • m(t) = mid-price at time t
  • k = prediction horizon (events ahead)
  • γ = threshold multiplier (auto-fitted)
  • σ = price volatility estimate

Performance Optimization

For large datasets (2M+ samples):

Stride Samples Training Time Use Case
1 2,400,000 10+ hours ❌ Too slow
50 48,000 20-30 min ✅ Production
100 24,000 5-10 min ✅ Fast testing
# 🚀 Fast testing (5 minutes)
python multi_horizon/fast_cli.py --ticker TSLA --horizons 10 \
    --epochs 2 --stride 100 --batch-size 2048

# 🏃 Full evaluation (30 minutes)
python multi_horizon/fast_cli.py --ticker TSLA --horizons 10 20 50 100 \
    --epochs 5 --stride 50 --batch-size 2048 --lr 1e-3

Backtesting Engine

The simulator executes trades with realistic market microstructure:

┌─────────────────────────────────────────────────────────┐
│                    Execution Flow                       │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Signal → Latency Queue → VWAP Walk → Portfolio Update  │
│     │          │              │              │          │
│     │     (N steps)     (consume LOB)   (track P&L)     │
│     │                                                   │
│  Conf > θ?                                              │
│  Pos < max?                                             │
│                                                         │
└─────────────────────────────────────────────────────────┘

Features:

  • Configurable execution latency (in LOB updates)
  • VWAP execution walking through book levels
  • Position limits and cash tracking
  • Trade logging with timestamps

Handcrafted Features (Baselines)

The FeatureEngineer extracts 22 interpretable features:

Category Features
Current State Spread, mid-price, level 1 sizes, imbalance
Statistical Mean, std, min, max, median over window
Trend Returns at 1, 5, 10, 20 step horizons
Volatility Rolling vol at 5, 10, 20 step windows
Order Imbalance Total and per-level imbalance ratios

Inspired By

  • DeepLOB - Original architecture
  • LOBSTER - High-frequency LOB data
  • HLOB - Heterogeneous LOB research

Roadmap

  • DeepLOB CNN+LSTM implementation
  • Multi-horizon training & evaluation
  • Benchmark framework (LogReg, XGBoost, LSTM)
  • Event-driven backtester with latency
  • TorchScript model export

License

MIT License - see LICENSE for details.

Author

Angelo - GitHub

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors