volume-synchronized probability of informed trading (VPIN) analysis toolkit for cryptocurrency markets with real-time data processing and visualization.
this repository provides two complementary implementations of VPIN (Volume-synchronized Probability of INformed trading) analysis for BTCUSDT futures:
- heavy/ - full-featured VPIN computation with Binance API integration and comprehensive dashboard
- light/ - lightweight visualization tool for pre-computed VPIN data
VPIN measures order flow toxicity and liquidity-induced volatility risk by analyzing volume imbalances between buyers and sellers. high VPIN values predict upcoming volatility spikes and market stress conditions.
vpin-opensource/
├── heavy/ # Full VPIN computation pipeline
│ ├── vpin.py # Main analysis script with API integration
│ └── README.md # Detailed documentation
├── light/ # Fast visualization from cached data
│ ├── vpinLight.py # Dashboard for pre-computed VPIN
│ └── raw_vpin_buckets.csv # Example cached VPIN data (1.6MB)
└── README.md # This overview file
-
prepare data directory:
mkdir -p VPIN/csv/
-
download historical data from Binance Data Vision and extract to
VPIN/csv/ -
run full analysis:
cd heavy/ pip install pandas requests dash plotly tqdm python vpin.py -
access dashboard at http://127.0.0.1:8050/
if you have cached VPIN data or want to test with provided sample:
-
run light version:
cd light/ pip install pandas dash plotly tqdm python vpinLight.py -
dashboard loads at http://127.0.0.1:8050/ with sample data
| Feature | Heavy Version | Light Version |
|---|---|---|
| VPIN computation | ✅ Full pipeline | ❌ Requires cached data |
| Binance API integration | ✅ Real-time data | ❌ Historical only |
| Processing time | ~10-30 minutes | <1 minute |
| Data requirements | 2+ GB CSV files | Pre-computed buckets |
| Dashboard features | Full interactive | Basic visualization |
| Export capabilities | Multiple formats | CSV only |
| Memory usage | High (>32GB for full dataset) | Low (<2GB) |
VPIN implementation follows Easley-López de Prado-O'Hara methodology:
- volume buckets: aggregate trades into fixed-volume buckets (default: 1000 BTC)
- flow classification: classify trades as buyer-initiated (
isBuyerMaker=False) or seller-initiated (isBuyerMaker=True) - imbalance calculation: compute
|buy_volume - sell_volume|per bucket - VPIN metric:
imbalance / total_bucket_volume - smoothing: apply moving average over N buckets (default: 50)
high VPIN indicates increased informed trading activity and predicts volatility spikes.
- source: Binance Vision
- format: monthly aggTrades CSV files
- coverage: 2017-present
- size: ~400MB per month compressed, ~2GB uncompressed
- source: Binance Futures API
- endpoint:
/fapi/v1/aggTrades - authentication: API key with Futures permission
- rate limits: handled automatically
both implementations support configuration via script variables:
V_BUCKET = 1000 # Volume bucket size (BTC)
N_WINDOW = 50 # VPIN moving average window
TIMEFRAME = '5T' # OHLCV timeframe ('5T'=5min, '1H'=1hour)
CSV_FOLDER = 'VPIN/csv/' # Historical data location
SYMBOL = 'BTCUSDT' # Trading pairraw_vpin_buckets.csv- timestamped VPIN values for each volume bucketvpin_ohlcv_export.csv- OHLCV candlesticks with aligned VPIN valuesvpin_analysis.log- execution logs and performance metrics
vpin_ohlcv_export.csv- OHLCV + VPIN alignment for further analysis
- RAM: 32+ GB for full dataset (2025-04 onwards)
- processing time: 10-30 minutes depending on data range
- disk space: 10+ GB for CSV storage
- CPU: multi-core recommended for pandas operations
- RAM: <2GB
- processing time: <1 minute
- disk space: minimal (uses cached buckets)
for real-time data integration:
- create Binance API key with Futures and Read permissions
- configure credentials in script:
API_KEY = 'your_api_key_here' API_SECRET = 'your_secret_key_here' SKIP_API_FETCH = False
- ensure IP whitelist includes your server
security note: store credentials securely, never commit to version control
- market microstructure analysis: measure informed trading probability
- volatility prediction: VPIN spikes precede price volatility
- liquidity risk assessment: identify toxic order flow periods
- algorithmic trading: incorporate VPIN signals into strategies
- market making: adjust spreads based on VPIN levels
follows Lee-Ready algorithm adaptation:
isBuyerMaker=True→ sell order (seller provides liquidity)isBuyerMaker=False→ buy order (buyer takes liquidity)
- fixed BTC volume per bucket (not USD value)
- chronological trade ordering maintained
- partial buckets at data boundaries handled
vpin = abs(buy_volume - sell_volume) / total_bucket_volume
vpin_smoothed = moving_average(vpin, window=N_WINDOW)- "No CSV files found": download historical data first - scripts require CSV files
- "API error 429": rate limited, script waits automatically
- "Invalid API permissions": recreate API key with Futures enabled
- Memory errors: reduce date range or use light version
- Empty dashboard: check logs for data loading errors
- verify CSV column names match expected format
- ensure timestamp consistency across files
- check for data gaps in historical coverage
MIT License - free for personal and commercial use.
contributions welcome:
- performance optimizations
- additional exchanges/symbols
- enhanced visualization features
- documentation improvements
- Easley, D., López de Prado, M., & O'Hara, M. (2012). The volume clock: Insights into the high-frequency paradigm
- Easley, D., López de Prado, M., & O'Hara, M. (2011). The microstructure of the "flash crash": flow toxicity, liquidity crashes, and the probability of informed trading
developed by @lubluniky • last updated: october 2025