|
| 1 | +# QuantTradeAI - LLM Agent Guide |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +QuantTradeAI is a comprehensive machine learning framework for quantitative trading strategies. The codebase implements momentum trading using ensemble models (Logistic Regression, Random Forest, XGBoost) with advanced feature engineering and backtesting capabilities. |
| 6 | + |
| 7 | +## Core Architecture |
| 8 | + |
| 9 | +### Key Components |
| 10 | +- **Data Layer**: `src/data/` - Data fetching, caching, validation |
| 11 | +- **Feature Engineering**: `src/features/` - Technical indicators, custom features |
| 12 | +- **ML Models**: `src/models/` - Ensemble classifiers, hyperparameter optimization |
| 13 | +- **Backtesting**: `src/backtest/` - Trade simulation, performance metrics |
| 14 | +- **Risk Management**: `src/trading/` - Stop-loss, position sizing |
| 15 | +- **Utilities**: `src/utils/` - Metrics, visualization, configuration |
| 16 | + |
| 17 | +### Data Flow |
| 18 | +1. Fetch OHLCV data (YFinance/AlphaVantage) |
| 19 | +2. Generate technical indicators (SMA, EMA, RSI, MACD, etc.) |
| 20 | +3. Create custom features (momentum score, volatility breakout) |
| 21 | +4. Generate trading labels (forward returns) |
| 22 | +5. Train ensemble models with hyperparameter optimization |
| 23 | +6. Backtest with risk management |
| 24 | +7. Evaluate performance metrics |
| 25 | + |
| 26 | +## Development Guidelines |
| 27 | + |
| 28 | +### Code Quality Standards |
| 29 | +- **Testing**: All new code MUST have unit tests |
| 30 | +- **Formatting**: Use Black for code formatting |
| 31 | +- **Linting**: Use flake8 for code quality |
| 32 | +- **Type Hints**: Include type annotations for all functions |
| 33 | + |
| 34 | +### Pre-commit Requirements |
| 35 | +```bash |
| 36 | +# Run all quality checks |
| 37 | +make format # Black formatting |
| 38 | +make lint # flake8 linting |
| 39 | +make test # pytest testing |
| 40 | +``` |
| 41 | + |
| 42 | +### Dependency Management |
| 43 | +- **CRITICAL**: Use Poetry CLI for ALL dependency changes |
| 44 | +- **NEVER** manually edit `pyproject.toml` dependencies |
| 45 | +- **ALWAYS** use: `poetry add package-name` or `poetry add --group dev package-name` |
| 46 | +- **REMOVE** dependencies with: `poetry remove package-name` |
| 47 | + |
| 48 | +### Testing Requirements |
| 49 | +- Unit tests for all new functions/classes |
| 50 | +- Integration tests for data pipelines |
| 51 | +- Performance tests for critical paths |
| 52 | +- Test coverage > 80% |
| 53 | + |
| 54 | +## Key Technologies |
| 55 | + |
| 56 | +### Core Dependencies |
| 57 | +- **Python 3.11+** - Main language |
| 58 | +- **Poetry** - Dependency management |
| 59 | +- **pandas/numpy** - Data manipulation |
| 60 | +- **scikit-learn** - ML algorithms |
| 61 | +- **XGBoost** - Gradient boosting |
| 62 | +- **Optuna** - Hyperparameter optimization |
| 63 | +- **yfinance** - Market data |
| 64 | +- **pandas-ta** - Technical indicators |
| 65 | + |
| 66 | +### Configuration |
| 67 | +- **YAML** - Configuration files |
| 68 | +- **Pydantic** - Configuration validation |
| 69 | +- **joblib** - Model persistence |
| 70 | + |
| 71 | +## API Structure |
| 72 | + |
| 73 | +### Data Loading |
| 74 | +```python |
| 75 | +from src.data.loader import DataLoader |
| 76 | +from src.data.processor import DataProcessor |
| 77 | + |
| 78 | +# Initialize components |
| 79 | +loader = DataLoader("config/model_config.yaml") |
| 80 | +processor = DataProcessor("config/features_config.yaml") |
| 81 | + |
| 82 | +# Fetch and process data |
| 83 | +data_dict = loader.fetch_data() |
| 84 | +df_processed = processor.process_data(df) |
| 85 | +df_labeled = processor.generate_labels(df_processed) |
| 86 | +``` |
| 87 | + |
| 88 | +### Model Training |
| 89 | +```python |
| 90 | +from src.models.classifier import MomentumClassifier |
| 91 | + |
| 92 | +# Initialize and train |
| 93 | +classifier = MomentumClassifier("config/model_config.yaml") |
| 94 | +X, y = classifier.prepare_data(df_labeled) |
| 95 | +classifier.train(X, y) |
| 96 | +``` |
| 97 | + |
| 98 | +### Backtesting |
| 99 | +```python |
| 100 | +from src.backtest.backtester import simulate_trades, compute_metrics |
| 101 | + |
| 102 | +# Simulate trades |
| 103 | +df_trades = simulate_trades(df_labeled) |
| 104 | +metrics = compute_metrics(df_trades) |
| 105 | +``` |
| 106 | + |
| 107 | +## Configuration Files |
| 108 | + |
| 109 | +### Model Configuration (`config/model_config.yaml`) |
| 110 | +- Data parameters (symbols, date ranges, caching) |
| 111 | +- Model hyperparameters (LR, RF, XGBoost) |
| 112 | +- Training settings (test size, CV folds) |
| 113 | +- Trading parameters (position sizing, risk) |
| 114 | + |
| 115 | +### Feature Configuration (`config/features_config.yaml`) |
| 116 | +- Technical indicator parameters |
| 117 | +- Feature preprocessing settings |
| 118 | +- Feature selection methods |
| 119 | +- Pipeline steps |
| 120 | + |
| 121 | +## Error Handling Patterns |
| 122 | + |
| 123 | +### Data Validation |
| 124 | +```python |
| 125 | +# Validate data quality |
| 126 | +is_valid = loader.validate_data(data_dict) |
| 127 | +if not is_valid: |
| 128 | + raise ValueError("Data validation failed") |
| 129 | +``` |
| 130 | + |
| 131 | +### Model Training |
| 132 | +```python |
| 133 | +try: |
| 134 | + classifier.train(X, y) |
| 135 | +except ValueError as e: |
| 136 | + logger.error(f"Training error: {e}") |
| 137 | + # Check data shapes and class distribution |
| 138 | +``` |
| 139 | + |
| 140 | +### Configuration Validation |
| 141 | +```python |
| 142 | +from src.utils.config_schemas import ModelConfigSchema |
| 143 | +ModelConfigSchema(**config) # Validates configuration |
| 144 | +``` |
| 145 | + |
| 146 | +## Performance Considerations |
| 147 | + |
| 148 | +### Memory Management |
| 149 | +- Use smaller data types for large datasets |
| 150 | +- Process data in batches for memory efficiency |
| 151 | +- Cache intermediate results appropriately |
| 152 | + |
| 153 | +### Computational Optimization |
| 154 | +- Vectorized operations over loops |
| 155 | +- Parallel processing for multiple assets |
| 156 | +- GPU acceleration for model training (future) |
| 157 | + |
| 158 | +## Testing Patterns |
| 159 | + |
| 160 | +### Unit Tests |
| 161 | +```python |
| 162 | +def test_data_loader(): |
| 163 | + loader = DataLoader("config/model_config.yaml") |
| 164 | + data = loader.fetch_data() |
| 165 | + assert len(data) > 0 |
| 166 | + assert all(isinstance(df, pd.DataFrame) for df in data.values()) |
| 167 | +``` |
| 168 | + |
| 169 | +### Integration Tests |
| 170 | +```python |
| 171 | +def test_complete_pipeline(): |
| 172 | + # Test end-to-end workflow |
| 173 | + loader = DataLoader() |
| 174 | + processor = DataProcessor() |
| 175 | + classifier = MomentumClassifier() |
| 176 | + |
| 177 | + data = loader.fetch_data() |
| 178 | + df = processor.process_data(data['AAPL']) |
| 179 | + df_labeled = processor.generate_labels(df) |
| 180 | + |
| 181 | + X, y = classifier.prepare_data(df_labeled) |
| 182 | + classifier.train(X, y) |
| 183 | + |
| 184 | + predictions = classifier.predict(X) |
| 185 | + assert len(predictions) == len(y) |
| 186 | +``` |
| 187 | + |
| 188 | +## Documentation Standards |
| 189 | + |
| 190 | +### Code Documentation |
| 191 | +- Docstrings for all public functions |
| 192 | +- Type hints for all parameters |
| 193 | +- Usage examples in docstrings |
| 194 | +- Clear parameter descriptions |
| 195 | + |
| 196 | +### API Documentation |
| 197 | +- Update `docs/api/` files for new functions |
| 198 | +- Include parameter types and return values |
| 199 | +- Provide usage examples |
| 200 | +- Document error conditions |
| 201 | + |
| 202 | +## Common Patterns |
| 203 | + |
| 204 | +### Feature Engineering |
| 205 | +```python |
| 206 | +from src.features.technical import sma, ema, rsi, macd |
| 207 | + |
| 208 | +# Generate technical indicators |
| 209 | +df['sma_20'] = sma(df['Close'], 20) |
| 210 | +df['rsi'] = rsi(df['Close'], 14) |
| 211 | +macd_df = macd(df['Close']) |
| 212 | +``` |
| 213 | + |
| 214 | +### Risk Management |
| 215 | +```python |
| 216 | +from src.trading.risk import apply_stop_loss_take_profit |
| 217 | + |
| 218 | +# Apply risk rules |
| 219 | +df_with_risk = apply_stop_loss_take_profit(df, stop_loss_pct=0.02) |
| 220 | +``` |
| 221 | + |
| 222 | +### Performance Metrics |
| 223 | +```python |
| 224 | +from src.utils.metrics import classification_metrics, sharpe_ratio |
| 225 | + |
| 226 | +# Calculate metrics |
| 227 | +metrics = classification_metrics(y_true, y_pred) |
| 228 | +sharpe = sharpe_ratio(returns, risk_free_rate=0.02) |
| 229 | +``` |
| 230 | + |
| 231 | +## Troubleshooting |
| 232 | + |
| 233 | +### Common Issues |
| 234 | +1. **Data Loading Failures**: Check network connectivity, API limits |
| 235 | +2. **Memory Issues**: Reduce data size, use batching |
| 236 | +3. **Model Training Errors**: Check data quality, class balance |
| 237 | +4. **Configuration Errors**: Validate YAML syntax, required fields |
| 238 | + |
| 239 | +### Debugging Steps |
| 240 | +1. Check logs for error messages |
| 241 | +2. Validate input data quality |
| 242 | +3. Test individual components |
| 243 | +4. Verify configuration parameters |
| 244 | + |
| 245 | +## Future Development Areas |
| 246 | + |
| 247 | +### High Priority |
| 248 | +- Real-time data streaming |
| 249 | +- Advanced risk management |
| 250 | +- Multi-timeframe support |
| 251 | +- LLM integration for sentiment analysis |
| 252 | + |
| 253 | +### Medium Priority |
| 254 | +- GPU acceleration |
| 255 | +- Microservices architecture |
| 256 | +- Advanced NLP features |
| 257 | +- Reinforcement learning |
| 258 | + |
| 259 | +### Low Priority |
| 260 | +- Quantum computing integration |
| 261 | +- Blockchain connectivity |
| 262 | +- Multi-modal AI |
| 263 | +- Federated learning |
| 264 | + |
| 265 | +## Resources |
| 266 | + |
| 267 | +### Documentation |
| 268 | +- [API Reference](docs/api/) |
| 269 | +- [Configuration Guide](docs/configuration.md) |
| 270 | +- [Quick Reference](docs/quick-reference.md) |
| 271 | + |
| 272 | +### External Libraries |
| 273 | +- [scikit-learn](https://scikit-learn.org/) |
| 274 | +- [XGBoost](https://xgboost.readthedocs.io/) |
| 275 | +- [Optuna](https://optuna.org/) |
| 276 | +- [pandas-ta](https://twopirllc.github.io/pandas-ta/) |
| 277 | + |
| 278 | +### Testing & Quality |
| 279 | +- [pytest](https://docs.pytest.org/) |
| 280 | +- [Black](https://black.readthedocs.io/) |
| 281 | +- [flake8](https://flake8.pycqa.org/) |
| 282 | + |
| 283 | +## Commit Guidelines |
| 284 | + |
| 285 | +### Before Committing |
| 286 | +1. Run `make format` - Format code with Black |
| 287 | +2. Run `make lint` - Check code quality with flake8 |
| 288 | +3. Run `make test` - Execute all tests |
| 289 | +4. Update documentation if needed |
| 290 | +5. Add/update tests for new functionality |
| 291 | + |
| 292 | +### Commit Messages |
| 293 | +- Use conventional commit format |
| 294 | +- Be descriptive and concise |
| 295 | +- Reference issues when applicable |
| 296 | +- Example: `feat: add new technical indicator for momentum` |
| 297 | + |
| 298 | +### Pull Request Requirements |
| 299 | +- All tests must pass |
| 300 | +- Code coverage > 80% |
| 301 | +- Documentation updated |
| 302 | +- No linting errors |
| 303 | +- Clear description of changes |
| 304 | + |
| 305 | +## Emergency Procedures |
| 306 | + |
| 307 | +### Breaking Changes |
| 308 | +- Maintain backward compatibility when possible |
| 309 | +- Use deprecation warnings for removed features |
| 310 | +- Update documentation immediately |
| 311 | +- Notify team of breaking changes |
| 312 | + |
| 313 | +### Critical Bugs |
| 314 | +- Create hotfix branch immediately |
| 315 | +- Add regression tests |
| 316 | +- Deploy fix as soon as possible |
| 317 | +- Document the issue and solution |
| 318 | + |
| 319 | +--- |
| 320 | + |
| 321 | +**Remember**: Always test thoroughly, follow coding standards, and maintain documentation. This codebase is used for financial applications - accuracy and reliability are paramount. |
0 commit comments