quantlab/
├── README.md # Main project documentation
├── .gitignore # Git ignore configuration
├── PROJECT_STRUCTURE.md # This file
│
├── system/ # System-level configuration
│ └── system_profile.yaml # Qlib system settings
│
├── configs/ # Workflow configurations ⭐
│ ├── lightgbm_external_data.yaml # Full universe (14,310 stocks)
│ ├── lightgbm_fixed_dates.yaml # 2024 only (date filtered)
│ └── lightgbm_liquid_universe.yaml # Filtered universe (13,187 stocks)
│
├── data/ # Local data storage
│ ├── parquet/ # Raw parquet data files
│ └── metadata/ # Metadata and cache
│
├── docs/ # Documentation 📚
│ ├── BACKTEST_SUMMARY.md # Comprehensive backtest analysis
│ ├── ALPHA158_SUMMARY.md # Alpha158 features overview
│ ├── ALPHA158_CORRECTED.md # Alpha158 corrections guide
│ ├── USE_QLIB_ALPHA158.md # How to use Alpha158
│ └── QUANTMINI_README.md # QuantMini data setup
│
├── notebooks/ # Jupyter notebooks 📓
│ └── workflow_by_code.ipynb # Qlib workflow examples
│
├── results/ # Experiment outputs 📊
│ ├── mlruns/ # MLflow experiment tracking
│ │ └── 489214785307856385/ # Experiment ID
│ │ ├── 2b374fe2956c4161a1bd2dcef7299bd2/ # Liquid universe run
│ │ ├── 44c320f998cf4c97a8be68ed15857f66/ # Fixed dates run
│ │ └── dc38cbc355104d67a1917a3f358ceb1f/ # Original run
│ └── visualizations/ # Charts and plots
│ └── backtest_visualization.png # Latest backtest chart
│
├── scripts/ # Utility scripts 🔧
│ ├── analysis/ # Analysis tools
│ │ └── visualize_results.py # Backtest visualization script
│ ├── data/ # Data processing
│ │ ├── convert_to_qlib.py # Convert data to qlib format
│ │ ├── quantmini_setup.py # Setup QuantMini data
│ │ └── refresh_today_data.py # Update latest data
│ └── tests/ # Test scripts
│ ├── test_qlib_alpha158.py # Test Alpha158 features
│ ├── test_stocks_minute_fix.py # Test data fixes
│ └── enable_alpha158.py # Enable Alpha158 handler
│
├── qlib_repo/ # Microsoft Qlib source (828MB)
│ └── (Full qlib repository clone)
│
└── .venv/ # Virtual environment (uv)
└── (Python packages)
Workflow configuration files for different backtesting scenarios.
Usage:
cd qlib_repo/examples
uv run qrun ../../configs/lightgbm_liquid_universe.yamlAll experiment outputs including MLflow tracking and visualizations.
MLflow runs:
- Each run has a unique ID
- Contains: model, predictions, portfolio analysis, metrics
Visualizations:
- Generated by
scripts/analysis/visualize_results.py - Charts showing IC, returns, drawdown
Organized utility scripts by function.
data/ - Data processing and setup analysis/ - Result visualization and analysis tests/ - Testing and validation scripts
Comprehensive documentation of findings and guides.
Key documents:
BACKTEST_SUMMARY.md- Most important - full analysisALPHA158_SUMMARY.md- Feature documentationUSE_QLIB_ALPHA158.md- Implementation guide
/Volumes/sandisk/quantmini-data/data/qlib/stocks_daily/
├── calendars/
│ └── day.txt # 442 trading days (2024-2025)
├── instruments/
│ ├── all.txt # 14,310 total instruments
│ └── liquid_stocks.txt # 13,187 filtered instruments
└── features/ # OHLCV price data
└── [stock symbols]/
| Component | Size | Notes |
|---|---|---|
| qlib_repo/ | 828MB | Full Microsoft Qlib clone |
| .venv/ | ~500MB | Python virtual environment |
| results/mlruns/ | ~150MB | All experiment runs |
| External data | ~2GB | On /Volumes/sandisk |
qlib_repo/- Large clone, can be re-downloaded.venv/- Virtual environmentdata/parquet/- Raw data files__pycache__/,*.pyc- Python cache*.log- Log files
cd qlib_repo/examples
uv run qrun ../../configs/lightgbm_liquid_universe.yaml# Update experiment ID in scripts/analysis/visualize_results.py
cd /path/to/quantlab
uv run python scripts/analysis/visualize_results.pypython scripts/tests/test_qlib_alpha158.pypython scripts/data/refresh_today_data.pyqlib_init:
provider_uri: "/Volumes/sandisk/quantmini-data/data/qlib/stocks_daily"
region: cn
market: &market liquid_stocks # Uses filtered universe
benchmark: &benchmark SPY
task:
model:
class: LGBModel
# ... LightGBM parameters
dataset:
class: DatasetH
# ... dataset configuration- All scripts should be run from project root
- Paths are relative to project root
- External data must be mounted at
/Volumes/sandisk - MLflow tracks all experiments automatically
✅ Reorganized from flat structure to organized directories
✅ Moved all docs to docs/
✅ Separated scripts by function
✅ Centralized configs in configs/
✅ Moved MLflow results to results/
✅ Added comprehensive README.md
✅ Created .gitignore for large files