Skip to content

Commit 64a95e5

Browse files
committed
Initial release: ML Microstructure Signals
A comprehensive machine learning system for predicting short-term price movements from order book microstructure data. Features: - Multiple ML models (Logistic Regression, Random Forest, LightGBM, LSTM, Transformer) - Advanced feature engineering (OFI, spread, depth, imbalance, VWAP, microprice) - Realistic backtesting with transaction costs and slippage - Interactive Streamlit dashboard - Comprehensive performance metrics (Sharpe, Sortino, Calmar ratios) - Hydra configuration management - MLflow experiment tracking - Complete test coverage This project demonstrates practical application of machine learning to high-frequency trading, with focus on order book microstructure analysis and realistic backtesting methodology. Author: Ismail Moudden Email: ismail.moudden1@gmail.com License: MIT
0 parents  commit 64a95e5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+7145
-0
lines changed

.github/workflows/ci.yml

Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
name: CI/CD Pipeline
2+
3+
on:
4+
push:
5+
branches: [ main, develop ]
6+
pull_request:
7+
branches: [ main ]
8+
9+
jobs:
10+
lint:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
15+
- name: Set up Python
16+
uses: actions/setup-python@v4
17+
with:
18+
python-version: '3.11'
19+
20+
- name: Install dependencies
21+
run: |
22+
python -m pip install --upgrade pip
23+
pip install -e ".[dev]"
24+
25+
- name: Run ruff
26+
run: ruff check .
27+
28+
- name: Run black
29+
run: black --check .
30+
31+
- name: Run mypy
32+
run: mypy ml_microstructure/
33+
34+
test:
35+
runs-on: ubuntu-latest
36+
strategy:
37+
matrix:
38+
python-version: ['3.11', '3.12']
39+
40+
steps:
41+
- uses: actions/checkout@v4
42+
43+
- name: Set up Python ${{ matrix.python-version }}
44+
uses: actions/setup-python@v4
45+
with:
46+
python-version: ${{ matrix.python-version }}
47+
48+
- name: Install dependencies
49+
run: |
50+
python -m pip install --upgrade pip
51+
pip install -e ".[dev]"
52+
53+
- name: Run tests
54+
run: pytest --cov=ml_microstructure --cov-report=xml --cov-report=term-missing
55+
56+
- name: Upload coverage to Codecov
57+
uses: codecov/codecov-action@v3
58+
with:
59+
file: ./coverage.xml
60+
flags: unittests
61+
name: codecov-umbrella
62+
fail_ci_if_error: false
63+
64+
integration-test:
65+
runs-on: ubuntu-latest
66+
needs: [lint, test]
67+
68+
steps:
69+
- uses: actions/checkout@v4
70+
71+
- name: Set up Python
72+
uses: actions/setup-python@v4
73+
with:
74+
python-version: '3.11'
75+
76+
- name: Install dependencies
77+
run: |
78+
python -m pip install --upgrade pip
79+
pip install -e ".[dev]"
80+
81+
- name: Run integration tests
82+
run: pytest tests/ -m integration
83+
84+
- name: Test synthetic data generation
85+
run: |
86+
python -c "
87+
from ml_microstructure.data import SyntheticLOBGenerator
88+
generator = SyntheticLOBGenerator(duration_seconds=10)
89+
data = generator.generate_data()
90+
print(f'Generated {len(data)} synthetic snapshots')
91+
"
92+
93+
- name: Test feature extraction
94+
run: |
95+
python -c "
96+
from ml_microstructure.data import SyntheticLOBGenerator, OrderBookProcessor
97+
from ml_microstructure.features import FeaturePipeline
98+
generator = SyntheticLOBGenerator(duration_seconds=10)
99+
snapshots = generator.generate_data()
100+
processor = OrderBookProcessor()
101+
df = processor.process_snapshots(snapshots)
102+
pipeline = FeaturePipeline()
103+
features = pipeline.extract_features(df)
104+
print(f'Extracted {len(features.columns)} features')
105+
"
106+
107+
- name: Test model training
108+
run: |
109+
python -c "
110+
from ml_microstructure.data import SyntheticLOBGenerator, OrderBookProcessor
111+
from ml_microstructure.features import FeaturePipeline
112+
from ml_microstructure.models import ModelFactory, ModelConfig
113+
from ml_microstructure.utils.labeling import LabelGenerator
114+
import pandas as pd
115+
import numpy as np
116+
117+
# Generate data
118+
generator = SyntheticLOBGenerator(duration_seconds=10)
119+
snapshots = generator.generate_data()
120+
processor = OrderBookProcessor()
121+
df = processor.process_snapshots(snapshots)
122+
123+
# Extract features
124+
pipeline = FeaturePipeline()
125+
df_features = pipeline.extract_features(df)
126+
127+
# Generate labels
128+
label_generator = LabelGenerator()
129+
labels = label_generator.generate_labels(df_features)
130+
df_labeled = df_features.copy()
131+
df_labeled['label'] = labels
132+
df_labeled = df_labeled.dropna()
133+
134+
# Prepare features
135+
feature_cols = [col for col in df_labeled.columns if col not in ['timestamp', 'label']]
136+
X = df_labeled[feature_cols]
137+
y = df_labeled['label']
138+
139+
# Train model
140+
config = ModelConfig(model_type='lightgbm')
141+
model = ModelFactory.create_model(config)
142+
model.fit(X, y)
143+
144+
# Make predictions
145+
predictions = model.predict(X)
146+
probabilities = model.predict_proba(X)
147+
148+
print(f'Trained model on {len(X)} samples')
149+
print(f'Predictions shape: {predictions.shape}')
150+
print(f'Probabilities shape: {probabilities.shape}')
151+
"
152+
153+
build:
154+
runs-on: ubuntu-latest
155+
needs: [lint, test, integration-test]
156+
157+
steps:
158+
- uses: actions/checkout@v4
159+
160+
- name: Set up Python
161+
uses: actions/setup-python@v4
162+
with:
163+
python-version: '3.11'
164+
165+
- name: Install build dependencies
166+
run: |
167+
python -m pip install --upgrade pip
168+
pip install build twine
169+
170+
- name: Build package
171+
run: python -m build
172+
173+
- name: Check package
174+
run: twine check dist/*
175+
176+
- name: Upload build artifacts
177+
uses: actions/upload-artifact@v3
178+
with:
179+
name: dist
180+
path: dist/
181+
182+
security:
183+
runs-on: ubuntu-latest
184+
185+
steps:
186+
- uses: actions/checkout@v4
187+
188+
- name: Set up Python
189+
uses: actions/setup-python@v4
190+
with:
191+
python-version: '3.11'
192+
193+
- name: Install dependencies
194+
run: |
195+
python -m pip install --upgrade pip
196+
pip install safety bandit
197+
198+
- name: Run safety check
199+
run: safety check
200+
201+
- name: Run bandit security check
202+
run: bandit -r ml_microstructure/
203+
204+
documentation:
205+
runs-on: ubuntu-latest
206+
needs: [lint, test]
207+
208+
steps:
209+
- uses: actions/checkout@v4
210+
211+
- name: Set up Python
212+
uses: actions/setup-python@v4
213+
with:
214+
python-version: '3.11'
215+
216+
- name: Install dependencies
217+
run: |
218+
python -m pip install --upgrade pip
219+
pip install -e ".[dev]"
220+
221+
- name: Check documentation
222+
run: |
223+
python -c "
224+
import ml_microstructure
225+
print('Package imports successfully')
226+
print(f'Version: {ml_microstructure.__version__}')
227+
"
228+
229+
- name: Test CLI commands
230+
run: |
231+
python -m ml_microstructure.pipeline.train --help || echo "CLI help not available"
232+
python -m ml_microstructure.pipeline.predict --help || echo "CLI help not available"
233+
python -m ml_microstructure.pipeline.evaluate --help || echo "CLI help not available"
234+
python -m ml_microstructure.backtest.run --help || echo "CLI help not available"
235+
236+
237+

.gitignore

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Python
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
*.so
6+
.Python
7+
build/
8+
develop-eggs/
9+
dist/
10+
downloads/
11+
eggs/
12+
.eggs/
13+
lib/
14+
lib64/
15+
parts/
16+
sdist/
17+
var/
18+
wheels/
19+
*.egg-info/
20+
.installed.cfg
21+
*.egg
22+
23+
# Virtual environments
24+
venv/
25+
env/
26+
ENV/
27+
28+
# IDE
29+
.vscode/
30+
.idea/
31+
*.swp
32+
*.swo
33+
34+
# OS
35+
.DS_Store
36+
Thumbs.db
37+
38+
# Project specific
39+
mlruns/
40+
demo_output/
41+
*.log
42+
.pytest_cache/
43+
.coverage
44+
htmlcov/
45+
46+
# Data files
47+
data/
48+
*.csv
49+
*.parquet
50+
*.h5
51+
*.hdf5
52+
53+
# Sensitive files
54+
.env
55+
.env.local
56+
.env.production
57+
secrets.yaml
58+
config_local.yaml
59+
*.key
60+
*.pem
61+
62+
# Temporary files
63+
*.tmp
64+
*.temp
65+
temp/
66+
tmp/

.pre-commit-config.yaml

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
repos:
2+
- repo: https://github.com/pre-commit/pre-commit-hooks
3+
rev: v4.4.0
4+
hooks:
5+
- id: trailing-whitespace
6+
- id: end-of-file-fixer
7+
- id: check-yaml
8+
- id: check-added-large-files
9+
- id: check-merge-conflict
10+
- id: debug-statements
11+
12+
- repo: https://github.com/astral-sh/ruff-pre-commit
13+
rev: v0.1.6
14+
hooks:
15+
- id: ruff
16+
args: [--fix, --exit-non-zero-on-fix]
17+
18+
- repo: https://github.com/psf/black
19+
rev: 23.9.1
20+
hooks:
21+
- id: black
22+
language_version: python3.11
23+
24+
- repo: https://github.com/pre-commit/mirrors-mypy
25+
rev: v1.5.1
26+
hooks:
27+
- id: mypy
28+
additional_dependencies: [types-all]
29+
args: [--strict]
30+
31+
- repo: local
32+
hooks:
33+
- id: pytest
34+
name: pytest
35+
entry: pytest
36+
language: system
37+
pass_filenames: false
38+
always_run: true
39+
args: [--maxfail=1, -q]

LICENSE

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
MIT License
2+
3+
Copyright (c) 2024 Ismail Moudden
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.
22+
23+
24+
1.38 MB
Binary file not shown.

0 commit comments

Comments
 (0)