Skip to content

Latest commit

 

History

History
207 lines (151 loc) · 7.05 KB

File metadata and controls

207 lines (151 loc) · 7.05 KB

VPIN Open Source Implementation

volume-synchronized probability of informed trading (VPIN) analysis toolkit for cryptocurrency markets with real-time data processing and visualization.

Overview

this repository provides two complementary implementations of VPIN (Volume-synchronized Probability of INformed trading) analysis for BTCUSDT futures:

  • heavy/ - full-featured VPIN computation with Binance API integration and comprehensive dashboard
  • light/ - lightweight visualization tool for pre-computed VPIN data

VPIN measures order flow toxicity and liquidity-induced volatility risk by analyzing volume imbalances between buyers and sellers. high VPIN values predict upcoming volatility spikes and market stress conditions.

Project Structure

vpin-opensource/
├── heavy/                    # Full VPIN computation pipeline
│   ├── vpin.py              # Main analysis script with API integration
│   └── README.md            # Detailed documentation
├── light/                   # Fast visualization from cached data
│   ├── vpinLight.py         # Dashboard for pre-computed VPIN
│   └── raw_vpin_buckets.csv # Example cached VPIN data (1.6MB)
└── README.md               # This overview file

Quick Start

Option 1: Full Analysis (Recommended)

  1. prepare data directory:

    mkdir -p VPIN/csv/
  2. download historical data from Binance Data Vision and extract to VPIN/csv/

  3. run full analysis:

    cd heavy/
    pip install pandas requests dash plotly tqdm
    python vpin.py
  4. access dashboard at http://127.0.0.1:8050/

Option 2: Quick Visualization

if you have cached VPIN data or want to test with provided sample:

  1. run light version:

    cd light/
    pip install pandas dash plotly tqdm
    python vpinLight.py
  2. dashboard loads at http://127.0.0.1:8050/ with sample data

Features Comparison

Feature Heavy Version Light Version
VPIN computation ✅ Full pipeline ❌ Requires cached data
Binance API integration ✅ Real-time data ❌ Historical only
Processing time ~10-30 minutes <1 minute
Data requirements 2+ GB CSV files Pre-computed buckets
Dashboard features Full interactive Basic visualization
Export capabilities Multiple formats CSV only
Memory usage High (>32GB for full dataset) Low (<2GB)

VPIN Methodology

VPIN implementation follows Easley-López de Prado-O'Hara methodology:

  1. volume buckets: aggregate trades into fixed-volume buckets (default: 1000 BTC)
  2. flow classification: classify trades as buyer-initiated (isBuyerMaker=False) or seller-initiated (isBuyerMaker=True)
  3. imbalance calculation: compute |buy_volume - sell_volume| per bucket
  4. VPIN metric: imbalance / total_bucket_volume
  5. smoothing: apply moving average over N buckets (default: 50)

high VPIN indicates increased informed trading activity and predicts volatility spikes.

Data Sources

Historical Data

  • source: Binance Vision
  • format: monthly aggTrades CSV files
  • coverage: 2017-present
  • size: ~400MB per month compressed, ~2GB uncompressed

Live Data (Heavy version only)

  • source: Binance Futures API
  • endpoint: /fapi/v1/aggTrades
  • authentication: API key with Futures permission
  • rate limits: handled automatically

Configuration

both implementations support configuration via script variables:

V_BUCKET = 1000      # Volume bucket size (BTC)
N_WINDOW = 50        # VPIN moving average window
TIMEFRAME = '5T'     # OHLCV timeframe ('5T'=5min, '1H'=1hour)
CSV_FOLDER = 'VPIN/csv/'  # Historical data location
SYMBOL = 'BTCUSDT'   # Trading pair

Output Files

Heavy Version

  • raw_vpin_buckets.csv - timestamped VPIN values for each volume bucket
  • vpin_ohlcv_export.csv - OHLCV candlesticks with aligned VPIN values
  • vpin_analysis.log - execution logs and performance metrics

Light Version

  • vpin_ohlcv_export.csv - OHLCV + VPIN alignment for further analysis

Performance Notes

Heavy Version Requirements

  • RAM: 32+ GB for full dataset (2025-04 onwards)
  • processing time: 10-30 minutes depending on data range
  • disk space: 10+ GB for CSV storage
  • CPU: multi-core recommended for pandas operations

Light Version Requirements

  • RAM: <2GB
  • processing time: <1 minute
  • disk space: minimal (uses cached buckets)

API Configuration (Heavy Version)

for real-time data integration:

  1. create Binance API key with Futures and Read permissions
  2. configure credentials in script:
    API_KEY = 'your_api_key_here'
    API_SECRET = 'your_secret_key_here' 
    SKIP_API_FETCH = False
  3. ensure IP whitelist includes your server

security note: store credentials securely, never commit to version control

Research Applications

  • market microstructure analysis: measure informed trading probability
  • volatility prediction: VPIN spikes precede price volatility
  • liquidity risk assessment: identify toxic order flow periods
  • algorithmic trading: incorporate VPIN signals into strategies
  • market making: adjust spreads based on VPIN levels

Technical Details

Trade Classification

follows Lee-Ready algorithm adaptation:

  • isBuyerMaker=True → sell order (seller provides liquidity)
  • isBuyerMaker=False → buy order (buyer takes liquidity)

Volume Bucket Construction

  • fixed BTC volume per bucket (not USD value)
  • chronological trade ordering maintained
  • partial buckets at data boundaries handled

VPIN Calculation

vpin = abs(buy_volume - sell_volume) / total_bucket_volume
vpin_smoothed = moving_average(vpin, window=N_WINDOW)

Troubleshooting

Common Issues

  • "No CSV files found": download historical data first - scripts require CSV files
  • "API error 429": rate limited, script waits automatically
  • "Invalid API permissions": recreate API key with Futures enabled
  • Memory errors: reduce date range or use light version
  • Empty dashboard: check logs for data loading errors

Data Quality

  • verify CSV column names match expected format
  • ensure timestamp consistency across files
  • check for data gaps in historical coverage

License

MIT License - free for personal and commercial use.

Contributing

contributions welcome:

  • performance optimizations
  • additional exchanges/symbols
  • enhanced visualization features
  • documentation improvements

References

  • Easley, D., López de Prado, M., & O'Hara, M. (2012). The volume clock: Insights into the high-frequency paradigm
  • Easley, D., López de Prado, M., & O'Hara, M. (2011). The microstructure of the "flash crash": flow toxicity, liquidity crashes, and the probability of informed trading

developed by @lublunikylast updated: october 2025