Synthetic end-to-end workflow that simulates a cryptocurrency exchange, injects manipulation patterns, and runs baseline surveillance + ML scoring. Alerts land in SQLite, CSV snapshots, and an interactive notebook for monitoring/case management.
- Rapid prototyping: Spin up a fully synthetic venue with realistic noise plus deterministic abusive behaviors to validate detection logic without live data.
- Rule + ML contrast: See how classic surveillance heuristics complement unsupervised learning for outlier discovery.
- Investigative UX: Notebook doubles as a lightweight case hub with contextual plots, float-share tables, shared-IP clusters, and ML leaderboards.
- Generates synthetic accounts, trades, and order book events with embedded scenarios: wash trading, ping-pong bursts, pump & dump, spoofing, front-running, account-message bursts, layering ladders, and cross-venue price shocks.
- Adds bespoke account-deviation bursts so profiles can drift sharply from historical norms.
- Trades and orders are tagged with venues while a position snapshot quantifies float share per account, enabling venue-aware and concentration rules.
- Embedded network metadata (device fingerprints + IP subnets) power behavioral clustering scenarios to mimic insider collaboration.
- Unsupervised ML (IsolationForest + KMeans distance on engineered behavioral vectors) ranks anomalous accounts and feeds those scores into the alert stream plus the monitoring notebook.
- Surveillance engine runs multiple rules: self-trades, bilateral ping-pong loops, extreme price moves vs rolling baseline, abnormal volume spikes, spoofing via order cancellations, account profile deviations, position concentration, message-rate spikes, cancel-to-fill surges, layering sequences, cross-market divergence, network collusion, ML behavioral anomalies, and front-running around large prints.
- Alerts persisted to
data/alerts.sqliteplus CSV snapshot for offline review. - Notebook (
notebooks/surveillance_dashboard.ipynb) now plots price, aggressor volume, order-book imbalance, and includes an interactive case-management widget for alert triage.
- Self-trade (wash trading) – flags trades where the buyer and seller account ids match, signaling intentional volume inflation or manipulation.
- Ping-pong loops – detects rapid back-and-forth trading between two accounts within a short window, often used to manipulate prints or paint the tape.
- Extreme price moves – monitors each asset’s rolling mean/std; generates alerts when a trade price deviates >4.5σ, capturing pump/dump bursts.
- Per-account volume spikes – compares each account’s 2-hour notional sum against its trailing 12-hour baseline to surface abnormal participation.
- Spoofing/layering – identifies large orders canceled within seconds of submission, indicative of intent to mislead order-book liquidity.
- Front-running – looks for accounts entering just before a large aggressive trade, then exiting quickly for ≥25 bps profit, suggesting use of non-public flow.
- Account profile deviation – leverages stored account baselines to detect retail/low-activity accounts whose rolling notional suddenly exceeds their norm by 12× within a 3-hour window.
- Position concentration – flags accounts whose net size controls ≥20% of an asset’s float (with ≥$1M exposure), highlighting cornering attempts.
- Message-rate spike – monitors per-account order submissions per minute and alerts when a trader floods the venue with >45 instructions in 60 seconds.
- Cancel-to-fill surge – tracks rolling 30-minute cancel/fill ratios, flagging accounts that cancel six or more orders for every fill while sending at least 20 messages.
- Multi-level layering – spots monotonic stacks of ≥4 canceled orders at escalating/descending price levels within 90 seconds, indicative of layered spoofing.
- Cross-market divergence – resamples prices by venue and highlights windows where venue spreads exceed 2%, suggesting dislocations or manipulative prints.
- Network collusion – clusters accounts by IP/device fingerprints and flags windows where ≥3 accounts push the same asset/direction within 60s, hinting at coordinated manipulation.
- ML behavioral anomalies – IsolationForest ingests aggregated trade/order/position features to spot outlier accounts that deviate from learned norms even when they avoid individual rule triggers.
- Data generation (
market_surveillance.data_generator)- Builds accounts enriched with region, risk tier, device fingerprint, IP subnet, and behavioral clusters.
- Synthesizes trades/orders/orders per venue while injecting scripted manipulations (pump & dump, spoofing, front-running, cross-market divergence, collusion loops).
- Aggregates latest positions (net/gross exposure, float share, dominant venue).
- Surveillance engine (
market_surveillance.engine)- Runs rule deck (
surveillance_rules.py) + ML anomaly scoring (ml_utils.py). - Persists accounts/trades/orders/positions/alerts/ml_scores to
data/and SQLite viapersistence.py. - CLI entrypoint (
python -m market_surveillance.main) orchestrates the run and prints JSON summary.
- Runs rule deck (
- Analyst notebook (
notebooks/surveillance_dashboard.ipynb)- Auto-loads or regenerates data, draws price/volume/venue spreads, surfaces float-share & shared-IP tables, visualizes ML anomalies, and provides a filterable alert table with status + note capture.
- Create a UV-managed virtual environment and install dependencies (Python 3.10+).
uv python install 3.10 # once per machine uv venv --python 3.10 .venv source .venv/bin/activate uv pip install -e .
- Run the surveillance pipeline (writes CSV artifacts and SQLite alerts under
data/).python -m market_surveillance.main
- Launch Jupyter to explore the monitoring notebook.
jupyter notebook notebooks/surveillance_dashboard.ipynb
.
├── data/ # auto-generated artifacts (csv + sqlite)
│ ├── accounts.csv
│ ├── trades.csv
│ ├── orders.csv
│ ├── positions.csv
│ └── ml_scores.csv
├── notebooks/
│ └── surveillance_dashboard.ipynb
├── src/
│ └── market_surveillance/
│ ├── data_generator.py # synthetic data + scenario injectors
│ ├── surveillance_rules.py# detection logic + Alert model
│ ├── persistence.py # SQLite alert store helper
│ ├── engine.py # orchestration + artifact writes
│ └── main.py # CLI entrypoint
└── pyproject.toml
- Add new rule functions under
surveillance_rules.pyand include them inrun_all_rules. - Expand generators with additional behaviors to test coverage.
- Wire additional ML detectors by adding feature engineering helpers in
ml_utils.py, calling them from the engine, and exposing scores in the notebook.