This repository contains a Jupyter notebook (main.ipynb) implementing a pairs trading strategy. It demonstrates how to:
- Identify a pair of correlated stocks
- Calculate the spread between their price series
- Engineer features from the spread (e.g., rolling statistics)
- Train a machine learning model to predict the next-period spread
- Generate trading signals based on predicted spread deviations
- Execute buy/sell orders for both stocks via the Interactive Brokers (IBKR) API
- Project Overview
- Features
- Requirements
- Installation
- Usage
- Notebook Structure
- Data Source
- Methodology
- Machine Learning Model
- IBKR API Integration
- Results and Evaluation
- License
- Acknowledgements
Pairs trading is a market-neutral strategy that exploits mean-reversion in the spread between two historically correlated assets. This project shows how to:
- Select a candidate stock pair using correlation analysis.
- Compute the price spread (difference or ratio) and its statistical properties.
- Build and evaluate a machine learning model (e.g., Ordinary Least Squares, Time-Series Linear Regression) to predict next-day spread.
- Generate entry and exit signals when the predicted spread deviates from its historical mean beyond a threshold.
- Automate trade execution by placing corresponding long/short orders on both legs via the IBKR API.
- Pair Selection: Correlation matrix and distance metrics for candidate pairs.
- Spread Calculation: Rolling mean, rolling standard deviation, and z-score computation.
- Feature Engineering: Lagged spread values, rolling window statistics, and technical indicators.
- Modeling: Train/test split, Ordinary Least-Squares, Time-Series Linear Regression for spread prediction.
- Signal Generation: Mean-reversion thresholds and take-profit levels.
- IBKR Trading: Live order placement with ib_insync, order management, and fill tracking.
- Python 3.8+
- Jupyter Notebook
Python Libraries:
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
- statsmodels
- ib_insync
Optional for enhancements:
- pmdarima (for ARIMA comparisons)
- rapidfuzz (for ticker validation in dashboard)
git clone https://github.com/marabsatt/project_blackwell.git
cd blackwell
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
jupyter notebook- Open
main.ipynbin Jupyter. - Pair Selection: Run cells to compute correlations and choose a stock pair.
- Spread Engineering: Compute and visualize spread statistics.
- Model Training: Fit the ML model to predict next-day spread.
- Signal Logic: Define entry/exit thresholds based on predicted spread z-scores.
- Trade Execution: Ensure IBKR Gateway/TWS is running and execute the trading cells to place orders.
Modify parameters (e.g., lookback windows, thresholds, model hyperparameters) directly in the notebook.
-
Imports & Configuration: Load libraries and API settings
-
Data Ingestion: Fetch historical price data for candidate tickers
-
Pair Selection: Correlation analysis and scatter plots
-
Spread Calculation: Compute spread, rolling mean/std, and z-score
-
Feature Engineering: Create lagged and rolling features from the spread
-
Modeling:
- Ordinary Least Squares
- Train/test split by date
- Time-Series Linear Regression
- Model fitting and validation
-
Prediction & Signals: Generate predicted spread and trading signals
-
IBKR API Integration: Connect to IBKR, define contracts, place orders
-
Results & Evaluation: Backtest P&L, plot performance, and diagnostics
- Historical price data via yfinance or IBKR API.
- Ensure trading calendar consistency by aligning dates and handling missing market days.
- Stationarity Check: Ensure spread series is mean-reverting (augmented Dickey-Fuller test).
- Rolling Statistics: Compute moving averages and standard deviations to detect deviations.
- Z-score Thresholds: Entry when |z-score| > 2.0; exit when z-score returns to zero or hits stop-loss.
-
Algorithms: Linear Regression, Ordinary Least Squares.
-
Features:
- Lagged spread values (t-1, t-2, ...)
- Rolling mean and std (window sizes 5, 10, 20)
-
Metrics: R², RMSE, MAE on hold-out set.
- Connection:
ib = IB(); ib.connect('127.0.0.1', 7497, clientId=1) - Contract Definition:
Stock(ticker, 'SMART', 'USD')andib.qualifyContracts() - Order Placement: Market and limit orders via
ib.placeOrder(contract, order) - Event Handling: Subscribe to
trade.filledEventfor execution feedback
Warning: Test in paper trading before deploying to a live account.
- Backtest Performance: P&L curve
- Signal Efficacy: Hit rate of profitable trades, average return per trade.
- Residual Analysis: Actual v. Predicted, Residuals v. Predicted, Residuals Distribution, and Q-Q plot of Residuals to verify model adequacy.
This project is licensed under the MIT License. See LICENSE for details.
- ib_insync for easy IBKR API access.
- Statsmodels documentation for time-series tools.
- Scikit-learn for machine learning utilities.
- Inspired by various pairs trading strategy tutorials and academic research.