⚠️ Alpha Software: MarketPipe is currently in alpha. APIs may change and features are being actively developed. Use with caution in production environments.
Modern, Python-native ETL framework for financial market data workflows with Domain-Driven Design
MarketPipe transforms how you collect, validate, and store financial market data. Built with modern Python patterns, it provides a robust, scalable foundation for financial data workflows with enterprise-grade observability and monitoring.
See features and development notes below.
git clone https://github.com/joeyda3rd/marketpipe.git
cd marketpipe
pip install marketpipe
# Generate test data (no API keys needed)
marketpipe ingest --provider fake --symbols AAPL GOOGL --start 2025-01-01 --end 2025-01-02
# Query the data
marketpipe query --symbol AAPL --start 2024-01-01
# Start monitoring dashboard
marketpipe metrics --port 8000
# Set up your environment
export ALPACA_KEY="your_api_key"
export ALPACA_SECRET="your_secret"
# Ingest real market data
marketpipe ingest --provider alpaca --symbols AAPL TSLA --start 2025-01-01 --end 2025-01-02
# Validate data quality
marketpipe validate --symbol AAPL --start 2025-01-01
# Aggregate to different timeframes
marketpipe aggregate --symbol AAPL --timeframe 5m --start 2025-01-01
Command | Description | Example |
---|---|---|
ingest |
Collect data from providers | marketpipe ingest --provider alpaca --symbols AAPL |
query |
Query stored data | marketpipe query --symbol AAPL --start 2024-01-01 |
validate |
Check data quality | marketpipe validate --symbol AAPL |
aggregate |
Create higher timeframes | marketpipe aggregate --timeframe 5m |
metrics |
Start monitoring server | marketpipe metrics --port 8000 |
export ALPACA_KEY="your_api_key"
export ALPACA_SECRET="your_secret"
marketpipe ingest --provider alpaca --symbols AAPL --feed iex
export IEX_TOKEN="your_token"
marketpipe ingest --provider iex --symbols AAPL
# No credentials needed - generates realistic test data
marketpipe ingest --provider fake --symbols AAPL GOOGL --start 2025-01-01
git clone https://github.com/joeyda3rd/marketpipe.git
cd marketpipe
scripts/setup # One-command development setup
# Install pre-commit hooks (recommended)
pip install pre-commit
pre-commit install
scripts/demo # Run a quick demo
scripts/format # Format code
scripts/lint # Run linters
scripts/test # Run tests
scripts/check # Health check
# Fast tests for development feedback (~3s)
scripts/test-fast
# Pre-commit tests (ultra-fast, ~2s)
scripts/pre-commit-tests
# Full test suite with coverage
scripts/test-full
# Simulate CI environment locally
scripts/test-ci
# Legacy make commands (still work)
make test
make test-all
MarketPipe enforces Domain-Driven Design boundaries:
# Check architecture compliance
import-linter --config setup.cfg
MarketPipe uses Alembic for database schema management:
# Apply latest migrations
alembic upgrade head
# Create new migration
alembic revision --autogenerate -m "description"
# PostgreSQL setup (optional, SQLite by default)
./tools/database/setup_postgres.sh
# Start Prometheus metrics endpoint
marketpipe metrics --port 8000
# Metrics available at http://localhost:8000/metrics
Pre-built dashboards for monitoring ingestion performance, data quality, and system health.
# See monitoring/README.md for setup instructions
- Request rates and latency by provider
- Data quality scores and validation errors
- Ingestion throughput and backlog monitoring
- System resources and error rates
Complete containerized stack with monitoring:
# Start MarketPipe + Prometheus + Grafana
docker compose up -d
# Access services:
# MarketPipe metrics: http://localhost:8000/metrics
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin/admin)
For production deployment, see docker/README.md
.
src/marketpipe/ # Source code
├── domain/ # Core business logic
├── ingestion/ # ETL pipeline
├── infrastructure/ # External integrations
└── cli/ # Command-line interface
tests/ # Comprehensive test suite
examples/ # Usage examples and demos
scripts/ # Development tools
alembic/ # Database migrations
tools/ # Database and development utilities
monitoring/ # Grafana dashboards
docker/ # Containerized deployment
config/ # Configuration templates
Variable | Description | Required |
---|---|---|
ALPACA_KEY |
Alpaca API key | For Alpaca provider |
ALPACA_SECRET |
Alpaca API secret | For Alpaca provider |
IEX_TOKEN |
IEX Cloud token | For IEX provider |
DATABASE_URL |
Database connection | Optional (SQLite default) |
MarketPipe supports YAML configuration files for complex setups:
providers:
alpaca:
feed: "iex" # or "sip" for premium data
rate_limit: 200
database:
url: "postgresql://user:pass@host/db"
monitoring:
enabled: true
port: 8000
- Domain-Driven Design with enforced boundaries and clean separation of concerns
- Plugin-based providers for easy integration with any market data source
- Async/sync dual APIs for maximum flexibility
- Type-safe configuration with comprehensive validation
- Alpaca Markets - Real-time and historical market data
- IEX Cloud - Professional-grade financial data
- Fake Provider - Generate realistic test data for development
- Extensible Plugin System - Add your own providers easily
- Parallel ingestion across multiple symbols and timeframes
- Schema validation with business rule enforcement
- Incremental loading with checkpoint/resume capability
- Data quality monitoring with comprehensive error reporting
- Partitioned storage in Parquet format for optimal performance
- Prometheus metrics with multiprocess support
- Grafana dashboards for real-time monitoring
- Structured logging with configurable levels
- Performance tracking and error alerting
- Docker Compose stack with monitoring included
- Database migrations with Alembic
- Health checks and readiness probes
- Production-ready configuration
MarketPipe's development focuses exclusively on enhancing ETL capabilities within our Domain-Driven Design architecture for financial market data with enterprise-grade reliability.
- CLI commands for managing symbol universes (domain models exist)
- Import/export universe definitions
- Universe validation and health checks
- Programmatic universe management API
- Expanded Python API beyond basic
load_ohlcv()
- Better integration with Jupyter notebooks and research environments
- Direct access to ingestion and validation services
- Rich data access patterns for analysis workflows
- Advanced filtering and aggregation capabilities
- Multiple export formats (CSV, JSON, Arrow, etc.)
- Custom query interface with complex predicates
- Streaming data access for large datasets
Immediate targets (8 providers): FRED (Economic Data), Binance (Crypto), Finnhub, Polygon, Tiingo, Twelve Data, plus enhanced IEX and Alpaca integrations
Extended ecosystem: Alpha Vantage, CME DataMine, Coinbase, Quandl, Refinitiv, Yahoo Finance, Kraken, Tradier, and more
Community contributions: See our Contributing Guide for adding new data provider integrations
- Handle schema evolution from data providers
- Version management for data structures
- Backward compatibility tools
- Automated migration assistance
- Enhanced change detection and delta processing
- Efficient backfill strategies
- Checkpoint recovery improvements
- Conflict resolution for overlapping data
- Data normalization and standardization pipelines
- Custom transformation rules and business logic
- Cross-provider data harmonization
- Real-time data transformation capabilities
- Data lineage tracking through the pipeline
- Enhanced pipeline health monitoring
- Data quality trend analysis
- Performance optimization insights
- Advanced partitioning strategies
- Compression and storage format options
- Multi-destination loading (multiple databases, cloud storage)
- Parallel loading with automatic scaling
- Metadata management for all ETL assets
- Data discovery and documentation
- Usage analytics and lineage visualization
- Data governance and quality tracking
=======
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Run tests:
make test
- Check architecture:
import-linter --config setup.cfg
- Submit a pull request
Working Features:
- Multi-provider data ingestion (Alpaca, IEX, Fake)
- Parquet storage with partitioning
- DuckDB query engine
- CLI interface with all core commands
- Docker deployment stack
- Prometheus monitoring
- Data validation and quality checks
- Database migrations
Known Limitations:
- API may change during alpha phase
- Limited error recovery in edge cases
- Documentation is still evolving
- Performance optimizations ongoing
Production Readiness:
- Suitable for development and testing
- Use caution in production environments
- Monitor resource usage and error rates
- Backup data regularly during alpha phase
Licensed under the Apache License, Version 2.0. See LICENSE for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Security: See SECURITY.md for security policy