Crypto Market Surveillance Demo

Synthetic end-to-end workflow that simulates a cryptocurrency exchange, injects manipulation patterns, and runs baseline surveillance + ML scoring. Alerts land in SQLite, CSV snapshots, and an interactive notebook for monitoring/case management.

Why this project?

Rapid prototyping: Spin up a fully synthetic venue with realistic noise plus deterministic abusive behaviors to validate detection logic without live data.
Rule + ML contrast: See how classic surveillance heuristics complement unsupervised learning for outlier discovery.
Investigative UX: Notebook doubles as a lightweight case hub with contextual plots, float-share tables, shared-IP clusters, and ML leaderboards.

Features

Generates synthetic accounts, trades, and order book events with embedded scenarios: wash trading, ping-pong bursts, pump & dump, spoofing, front-running, account-message bursts, layering ladders, and cross-venue price shocks.
Adds bespoke account-deviation bursts so profiles can drift sharply from historical norms.
Trades and orders are tagged with venues while a position snapshot quantifies float share per account, enabling venue-aware and concentration rules.
Embedded network metadata (device fingerprints + IP subnets) power behavioral clustering scenarios to mimic insider collaboration.
Unsupervised ML (IsolationForest + KMeans distance on engineered behavioral vectors) ranks anomalous accounts and feeds those scores into the alert stream plus the monitoring notebook.
Surveillance engine runs multiple rules: self-trades, bilateral ping-pong loops, extreme price moves vs rolling baseline, abnormal volume spikes, spoofing via order cancellations, account profile deviations, position concentration, message-rate spikes, cancel-to-fill surges, layering sequences, cross-market divergence, network collusion, ML behavioral anomalies, and front-running around large prints.
Alerts persisted to data/alerts.sqlite plus CSV snapshot for offline review.
Notebook (notebooks/surveillance_dashboard.ipynb) now plots price, aggressor volume, order-book imbalance, and includes an interactive case-management widget for alert triage.

Surveillance Rules

Self-trade (wash trading) – flags trades where the buyer and seller account ids match, signaling intentional volume inflation or manipulation.
Ping-pong loops – detects rapid back-and-forth trading between two accounts within a short window, often used to manipulate prints or paint the tape.
Extreme price moves – monitors each asset’s rolling mean/std; generates alerts when a trade price deviates >4.5σ, capturing pump/dump bursts.
Per-account volume spikes – compares each account’s 2-hour notional sum against its trailing 12-hour baseline to surface abnormal participation.
Spoofing/layering – identifies large orders canceled within seconds of submission, indicative of intent to mislead order-book liquidity.
Front-running – looks for accounts entering just before a large aggressive trade, then exiting quickly for ≥25 bps profit, suggesting use of non-public flow.
Account profile deviation – leverages stored account baselines to detect retail/low-activity accounts whose rolling notional suddenly exceeds their norm by 12× within a 3-hour window.
Position concentration – flags accounts whose net size controls ≥20% of an asset’s float (with ≥$1M exposure), highlighting cornering attempts.
Message-rate spike – monitors per-account order submissions per minute and alerts when a trader floods the venue with >45 instructions in 60 seconds.
Cancel-to-fill surge – tracks rolling 30-minute cancel/fill ratios, flagging accounts that cancel six or more orders for every fill while sending at least 20 messages.
Multi-level layering – spots monotonic stacks of ≥4 canceled orders at escalating/descending price levels within 90 seconds, indicative of layered spoofing.
Cross-market divergence – resamples prices by venue and highlights windows where venue spreads exceed 2%, suggesting dislocations or manipulative prints.
Network collusion – clusters accounts by IP/device fingerprints and flags windows where ≥3 accounts push the same asset/direction within 60s, hinting at coordinated manipulation.
ML behavioral anomalies – IsolationForest ingests aggregated trade/order/position features to spot outlier accounts that deviate from learned norms even when they avoid individual rule triggers.

Architecture at a glance

Data generation (market_surveillance.data_generator)
- Builds accounts enriched with region, risk tier, device fingerprint, IP subnet, and behavioral clusters.
- Synthesizes trades/orders/orders per venue while injecting scripted manipulations (pump & dump, spoofing, front-running, cross-market divergence, collusion loops).
- Aggregates latest positions (net/gross exposure, float share, dominant venue).
Surveillance engine (market_surveillance.engine)
- Runs rule deck (surveillance_rules.py) + ML anomaly scoring (ml_utils.py).
- Persists accounts/trades/orders/positions/alerts/ml_scores to data/ and SQLite via persistence.py.
- CLI entrypoint (python -m market_surveillance.main) orchestrates the run and prints JSON summary.
Analyst notebook (notebooks/surveillance_dashboard.ipynb)
- Auto-loads or regenerates data, draws price/volume/venue spreads, surfaces float-share & shared-IP tables, visualizes ML anomalies, and provides a filterable alert table with status + note capture.

Getting Started

Create a UV-managed virtual environment and install dependencies (Python 3.10+).

uv python install 3.10          # once per machine
uv venv --python 3.10 .venv
source .venv/bin/activate
uv pip install -e .

Run the surveillance pipeline (writes CSV artifacts and SQLite alerts under data/).
```
python -m market_surveillance.main
```

Launch Jupyter to explore the monitoring notebook.

jupyter notebook notebooks/surveillance_dashboard.ipynb

Project Layout

.
├── data/                        # auto-generated artifacts (csv + sqlite)
│   ├── accounts.csv
│   ├── trades.csv
│   ├── orders.csv
│   ├── positions.csv
│   └── ml_scores.csv
├── notebooks/
│   └── surveillance_dashboard.ipynb
├── src/
│   └── market_surveillance/
│       ├── data_generator.py    # synthetic data + scenario injectors
│       ├── surveillance_rules.py# detection logic + Alert model
│       ├── persistence.py       # SQLite alert store helper
│       ├── engine.py            # orchestration + artifact writes
│       └── main.py              # CLI entrypoint
└── pyproject.toml

Extending

Add new rule functions under surveillance_rules.py and include them in run_all_rules.
Expand generators with additional behaviors to test coverage.
Wire additional ML detectors by adding feature engineering helpers in ml_utils.py, calling them from the engine, and exposing scores in the notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
notebooks		notebooks
src/market_surveillance		src/market_surveillance
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crypto Market Surveillance Demo

Why this project?

Features

Surveillance Rules

Architecture at a glance

Getting Started

Project Layout

Extending

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Crypto Market Surveillance Demo

Why this project?

Features

Surveillance Rules

Architecture at a glance

Getting Started

Project Layout

Extending

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages