A Python library for converting OHLCV (Open, High, Low, Close, Volume) data to synthetic Level 1 orderbook representation and vice versa, designed for financial data processing and validation.
This project provides a bidirectional conversion pipeline between OHLCV bars and synthetic orderbook snapshots. The conversion maintains data integrity and allows for round-trip validation, making it useful for:
- Financial data preprocessing
- Trading algorithm testing
- Market data validation
- Quantitative research
- Bidirectional Conversion: Convert OHLCV data to orderbook snapshots and back
- Synthetic Orderbook Generation: Creates realistic Level 1 orderbook data from OHLCV bars
- Parquet I/O Support: Efficient reading and writing of financial data using Apache Parquet
- Data Validation: Built-in validation for data integrity and consistency
- Type Safety: Full type annotations and mypy compatibility
- Configurable Parameters: Customizable spread generation and price path estimation
- Comprehensive Testing: Full test coverage with round-trip validation
pip install ohlcv-to-orderbookpip install "ohlcv-to-orderbook[dev]"- Python 3.8 or higher
- pip or uv package manager
git clone https://github.com/luca/ohlcv-to-orderbook.git
cd ohlcv-to-orderbook
pip install -e .git clone https://github.com/luca/ohlcv-to-orderbook.git
cd ohlcv-to-orderbook
pip install -e ".[dev]"from ohlcv_to_orderbook import OrderbookGenerator, OHLCVGenerator
import pandas as pd
# Create sample OHLCV data
ohlcv_data = pd.DataFrame({
'timestamp': [1640995200, 1640995260, 1640995320],
'open': [50000.0, 50100.0, 50050.0],
'high': [50200.0, 50150.0, 50100.0],
'low': [49900.0, 50000.0, 49950.0],
'close': [50100.0, 50050.0, 50080.0],
'volume': [1.5, 2.3, 1.8]
})
# Convert OHLCV to orderbook
orderbook_gen = OrderbookGenerator()
orderbook_data = orderbook_gen.generate_orderbook(ohlcv_data)
# Convert back to OHLCV
ohlcv_gen = OHLCVGenerator()
reconstructed_ohlcv = ohlcv_gen.generate_ohlcv(orderbook_data)
print("Original OHLCV:", ohlcv_data)
print("Reconstructed OHLCV:", reconstructed_ohlcv)from ohlcv_to_orderbook import OrderbookGenerator
from ohlcv_to_orderbook.io_handlers import ParquetHandler
# Initialize components
generator = OrderbookGenerator()
io_handler = ParquetHandler()
# Read OHLCV data from Parquet
ohlcv_data = io_handler.read_ohlcv("input_data.parquet")
# Convert to orderbook
orderbook_data = generator.generate_orderbook(ohlcv_data)
# Save orderbook data
io_handler.write_orderbook(orderbook_data, "orderbook_output.parquet")from ohlcv_to_orderbook import generate_test_data
# Generate synthetic data for testing
ohlcv_data, orderbook_data = generate_test_data(
n_bars=100,
symbol="BTCUSD",
start_price=50000.0
)The package provides a powerful command-line interface for batch processing and automation.
After installing the package, the ohlcv-converter command will be available:
# Install the package
pip install -e .
# The CLI command is now available
ohlcv-converter --help# Basic conversion
ohlcv-converter ohlcv-to-orderbook input_ohlcv.parquet output_orderbook.parquet
# With custom parameters
ohlcv-converter ohlcv-to-orderbook \
--spread 0.002 \
--points 6 \
--size-factor 0.4 \
--price-decimals 8 \
--volume-decimals 6 \
--validate \
--verbose \
input_ohlcv.parquet output_orderbook.parquet# Basic conversion
ohlcv-converter orderbook-to-ohlcv input_orderbook.parquet output_ohlcv.parquet
# With validation and verbose output
ohlcv-converter orderbook-to-ohlcv \
--validate \
--verbose \
input_orderbook.parquet output_ohlcv.parquetOHLCV to Orderbook Options:
--spread, -s: Spread percentage (default: 0.001 = 0.1%)--points, -p: Number of orderbook points per OHLCV bar (default: 4)--size-factor, -f: Volume distribution factor (default: 0.3)--price-decimals: Decimal places for prices (default: 8)--volume-decimals: Decimal places for volumes (default: 8)--validate: Validate the conversion result--verbose, -v: Enable verbose output
Orderbook to OHLCV Options:
--validate: Validate the conversion result--verbose, -v: Enable verbose output
# Convert with high precision and validation
ohlcv-converter ohlcv-to-orderbook \
--spread 0.0005 \
--points 8 \
--validate \
--verbose \
btc_ohlcv_1m.parquet btc_orderbook_l1.parquet
# Round-trip conversion test
ohlcv-converter ohlcv-to-orderbook original.parquet temp_orderbook.parquet
ohlcv-converter orderbook-to-ohlcv temp_orderbook.parquet reconstructed.parquet
# Process multiple timeframes
for file in data/ohlcv_*.parquet; do
output="orderbook_$(basename "$file")"
ohlcv-converter ohlcv-to-orderbook --validate "$file" "$output"
doneThe library supports various configuration options:
from ohlcv_to_orderbook.config import OrderbookConfig
config = OrderbookConfig(
min_spread_bps=1.0, # Minimum spread in basis points
max_spread_bps=10.0, # Maximum spread in basis points
volume_distribution='uniform', # Volume distribution method
price_precision=2, # Decimal places for prices
volume_precision=8 # Decimal places for volumes
)
generator = OrderbookGenerator(config=config)The expected OHLCV data format is a pandas DataFrame with the following columns:
timestamp: Unix timestamp (int)open: Opening price (float)high: Highest price (float)low: Lowest price (float)close: Closing price (float)volume: Volume traded (float)
The generated orderbook data is a pandas DataFrame with:
timestamp: Unix timestamp (int)bid_price: Best bid price (float)bid_size: Best bid volume (float)ask_price: Best ask price (float)ask_size: Best ask volume (float)
The conversion algorithm estimates the price path within each OHLCV bar:
-
Path Determination: Decides the price sequence based on the relationship between Open, High, and Low:
- If Open is closer to High: Open → High → Low → Close
- If Open is closer to Low: Open → Low → High → Close
-
Spread Generation: Creates realistic bid-ask spreads based on:
- Market volatility (derived from High-Low range)
- Volume patterns
- Configurable spread parameters
-
Volume Distribution: Distributes the total volume across generated orderbook snapshots
The reverse conversion aggregates orderbook snapshots:
- Price Aggregation: Calculates OHLCV values from bid/ask prices over time intervals
- Volume Summation: Sums volumes across all snapshots in the time period
- Timestamp Alignment: Groups snapshots by time intervals
Run the complete test suite:
# Run all tests
pytest
# Run with coverage
python run_tests_with_coverage.py
# Run specific test files
pytest tests/test_conversions.py
pytest tests/test_pipeline.pyThe library includes comprehensive round-trip tests to ensure data integrity:
# Example of round-trip validation
original_ohlcv = generate_test_data(num_bars=50)
orderbook_data = orderbook_gen.generate_orderbook(original_ohlcv)
reconstructed_ohlcv = ohlcv_gen.generate_ohlcv(orderbook_data)
# Validate reconstruction accuracy
assert_ohlcv_similarity(original_ohlcv, reconstructed_ohlcv, tolerance=0.01)git clone <repository-url>
cd ohlcv-to-orderbook
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e ".[dev]"The project uses several tools for code quality:
# Type checking
mypy ohlcv_to_orderbook/
# Run tests with coverage
pytest --cov=ohlcv_to_orderbook tests/
# Format code (if using black)
black ohlcv_to_orderbook/ tests/ohlcv-to-orderbook/
├── ohlcv_to_orderbook/ # Main package
│ ├── __init__.py # Package initialization
│ ├── config.py # Configuration classes
│ ├── data_types.py # Type definitions
│ ├── exceptions.py # Custom exceptions
│ ├── io_handlers.py # Parquet I/O operations
│ ├── ohlcv_to_orderbook.py # OHLCV → Orderbook conversion
│ ├── orderbook_to_ohlcv.py # Orderbook → OHLCV conversion
│ └── synthetic_data.py # Test data generation
├── tests/ # Test files
│ ├── test_conversions.py # Conversion tests
│ └── test_pipeline.py # End-to-end pipeline tests
├── pyproject.toml # Project configuration
├── mypy.ini # Type checking configuration
└── README.md # This file
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Ensure all tests pass (
pytest) - Run type checking (
mypy ohlcv_to_orderbook/) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with pandas for data manipulation
- Uses PyArrow for efficient Parquet I/O
- Type safety provided by mypy
- Testing framework: pytest
For questions, issues, or contributions, please open an issue on the project repository.