Skip to content

Commit 8f5fa9f

Browse files
authored
Merge pull request #23 from AKKI0511/cursor/generate-documentation-for-public-apis-800d
Generate documentation for public APIs
2 parents 61020ed + 3a78cb9 commit 8f5fa9f

File tree

12 files changed

+2862
-155
lines changed

12 files changed

+2862
-155
lines changed

LLM.md

Lines changed: 321 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,321 @@
1+
# QuantTradeAI - LLM Agent Guide
2+
3+
## Project Overview
4+
5+
QuantTradeAI is a comprehensive machine learning framework for quantitative trading strategies. The codebase implements momentum trading using ensemble models (Logistic Regression, Random Forest, XGBoost) with advanced feature engineering and backtesting capabilities.
6+
7+
## Core Architecture
8+
9+
### Key Components
10+
- **Data Layer**: `src/data/` - Data fetching, caching, validation
11+
- **Feature Engineering**: `src/features/` - Technical indicators, custom features
12+
- **ML Models**: `src/models/` - Ensemble classifiers, hyperparameter optimization
13+
- **Backtesting**: `src/backtest/` - Trade simulation, performance metrics
14+
- **Risk Management**: `src/trading/` - Stop-loss, position sizing
15+
- **Utilities**: `src/utils/` - Metrics, visualization, configuration
16+
17+
### Data Flow
18+
1. Fetch OHLCV data (YFinance/AlphaVantage)
19+
2. Generate technical indicators (SMA, EMA, RSI, MACD, etc.)
20+
3. Create custom features (momentum score, volatility breakout)
21+
4. Generate trading labels (forward returns)
22+
5. Train ensemble models with hyperparameter optimization
23+
6. Backtest with risk management
24+
7. Evaluate performance metrics
25+
26+
## Development Guidelines
27+
28+
### Code Quality Standards
29+
- **Testing**: All new code MUST have unit tests
30+
- **Formatting**: Use Black for code formatting
31+
- **Linting**: Use flake8 for code quality
32+
- **Type Hints**: Include type annotations for all functions
33+
34+
### Pre-commit Requirements
35+
```bash
36+
# Run all quality checks
37+
make format # Black formatting
38+
make lint # flake8 linting
39+
make test # pytest testing
40+
```
41+
42+
### Dependency Management
43+
- **CRITICAL**: Use Poetry CLI for ALL dependency changes
44+
- **NEVER** manually edit `pyproject.toml` dependencies
45+
- **ALWAYS** use: `poetry add package-name` or `poetry add --group dev package-name`
46+
- **REMOVE** dependencies with: `poetry remove package-name`
47+
48+
### Testing Requirements
49+
- Unit tests for all new functions/classes
50+
- Integration tests for data pipelines
51+
- Performance tests for critical paths
52+
- Test coverage > 80%
53+
54+
## Key Technologies
55+
56+
### Core Dependencies
57+
- **Python 3.11+** - Main language
58+
- **Poetry** - Dependency management
59+
- **pandas/numpy** - Data manipulation
60+
- **scikit-learn** - ML algorithms
61+
- **XGBoost** - Gradient boosting
62+
- **Optuna** - Hyperparameter optimization
63+
- **yfinance** - Market data
64+
- **pandas-ta** - Technical indicators
65+
66+
### Configuration
67+
- **YAML** - Configuration files
68+
- **Pydantic** - Configuration validation
69+
- **joblib** - Model persistence
70+
71+
## API Structure
72+
73+
### Data Loading
74+
```python
75+
from src.data.loader import DataLoader
76+
from src.data.processor import DataProcessor
77+
78+
# Initialize components
79+
loader = DataLoader("config/model_config.yaml")
80+
processor = DataProcessor("config/features_config.yaml")
81+
82+
# Fetch and process data
83+
data_dict = loader.fetch_data()
84+
df_processed = processor.process_data(df)
85+
df_labeled = processor.generate_labels(df_processed)
86+
```
87+
88+
### Model Training
89+
```python
90+
from src.models.classifier import MomentumClassifier
91+
92+
# Initialize and train
93+
classifier = MomentumClassifier("config/model_config.yaml")
94+
X, y = classifier.prepare_data(df_labeled)
95+
classifier.train(X, y)
96+
```
97+
98+
### Backtesting
99+
```python
100+
from src.backtest.backtester import simulate_trades, compute_metrics
101+
102+
# Simulate trades
103+
df_trades = simulate_trades(df_labeled)
104+
metrics = compute_metrics(df_trades)
105+
```
106+
107+
## Configuration Files
108+
109+
### Model Configuration (`config/model_config.yaml`)
110+
- Data parameters (symbols, date ranges, caching)
111+
- Model hyperparameters (LR, RF, XGBoost)
112+
- Training settings (test size, CV folds)
113+
- Trading parameters (position sizing, risk)
114+
115+
### Feature Configuration (`config/features_config.yaml`)
116+
- Technical indicator parameters
117+
- Feature preprocessing settings
118+
- Feature selection methods
119+
- Pipeline steps
120+
121+
## Error Handling Patterns
122+
123+
### Data Validation
124+
```python
125+
# Validate data quality
126+
is_valid = loader.validate_data(data_dict)
127+
if not is_valid:
128+
raise ValueError("Data validation failed")
129+
```
130+
131+
### Model Training
132+
```python
133+
try:
134+
classifier.train(X, y)
135+
except ValueError as e:
136+
logger.error(f"Training error: {e}")
137+
# Check data shapes and class distribution
138+
```
139+
140+
### Configuration Validation
141+
```python
142+
from src.utils.config_schemas import ModelConfigSchema
143+
ModelConfigSchema(**config) # Validates configuration
144+
```
145+
146+
## Performance Considerations
147+
148+
### Memory Management
149+
- Use smaller data types for large datasets
150+
- Process data in batches for memory efficiency
151+
- Cache intermediate results appropriately
152+
153+
### Computational Optimization
154+
- Vectorized operations over loops
155+
- Parallel processing for multiple assets
156+
- GPU acceleration for model training (future)
157+
158+
## Testing Patterns
159+
160+
### Unit Tests
161+
```python
162+
def test_data_loader():
163+
loader = DataLoader("config/model_config.yaml")
164+
data = loader.fetch_data()
165+
assert len(data) > 0
166+
assert all(isinstance(df, pd.DataFrame) for df in data.values())
167+
```
168+
169+
### Integration Tests
170+
```python
171+
def test_complete_pipeline():
172+
# Test end-to-end workflow
173+
loader = DataLoader()
174+
processor = DataProcessor()
175+
classifier = MomentumClassifier()
176+
177+
data = loader.fetch_data()
178+
df = processor.process_data(data['AAPL'])
179+
df_labeled = processor.generate_labels(df)
180+
181+
X, y = classifier.prepare_data(df_labeled)
182+
classifier.train(X, y)
183+
184+
predictions = classifier.predict(X)
185+
assert len(predictions) == len(y)
186+
```
187+
188+
## Documentation Standards
189+
190+
### Code Documentation
191+
- Docstrings for all public functions
192+
- Type hints for all parameters
193+
- Usage examples in docstrings
194+
- Clear parameter descriptions
195+
196+
### API Documentation
197+
- Update `docs/api/` files for new functions
198+
- Include parameter types and return values
199+
- Provide usage examples
200+
- Document error conditions
201+
202+
## Common Patterns
203+
204+
### Feature Engineering
205+
```python
206+
from src.features.technical import sma, ema, rsi, macd
207+
208+
# Generate technical indicators
209+
df['sma_20'] = sma(df['Close'], 20)
210+
df['rsi'] = rsi(df['Close'], 14)
211+
macd_df = macd(df['Close'])
212+
```
213+
214+
### Risk Management
215+
```python
216+
from src.trading.risk import apply_stop_loss_take_profit
217+
218+
# Apply risk rules
219+
df_with_risk = apply_stop_loss_take_profit(df, stop_loss_pct=0.02)
220+
```
221+
222+
### Performance Metrics
223+
```python
224+
from src.utils.metrics import classification_metrics, sharpe_ratio
225+
226+
# Calculate metrics
227+
metrics = classification_metrics(y_true, y_pred)
228+
sharpe = sharpe_ratio(returns, risk_free_rate=0.02)
229+
```
230+
231+
## Troubleshooting
232+
233+
### Common Issues
234+
1. **Data Loading Failures**: Check network connectivity, API limits
235+
2. **Memory Issues**: Reduce data size, use batching
236+
3. **Model Training Errors**: Check data quality, class balance
237+
4. **Configuration Errors**: Validate YAML syntax, required fields
238+
239+
### Debugging Steps
240+
1. Check logs for error messages
241+
2. Validate input data quality
242+
3. Test individual components
243+
4. Verify configuration parameters
244+
245+
## Future Development Areas
246+
247+
### High Priority
248+
- Real-time data streaming
249+
- Advanced risk management
250+
- Multi-timeframe support
251+
- LLM integration for sentiment analysis
252+
253+
### Medium Priority
254+
- GPU acceleration
255+
- Microservices architecture
256+
- Advanced NLP features
257+
- Reinforcement learning
258+
259+
### Low Priority
260+
- Quantum computing integration
261+
- Blockchain connectivity
262+
- Multi-modal AI
263+
- Federated learning
264+
265+
## Resources
266+
267+
### Documentation
268+
- [API Reference](docs/api/)
269+
- [Configuration Guide](docs/configuration.md)
270+
- [Quick Reference](docs/quick-reference.md)
271+
272+
### External Libraries
273+
- [scikit-learn](https://scikit-learn.org/)
274+
- [XGBoost](https://xgboost.readthedocs.io/)
275+
- [Optuna](https://optuna.org/)
276+
- [pandas-ta](https://twopirllc.github.io/pandas-ta/)
277+
278+
### Testing & Quality
279+
- [pytest](https://docs.pytest.org/)
280+
- [Black](https://black.readthedocs.io/)
281+
- [flake8](https://flake8.pycqa.org/)
282+
283+
## Commit Guidelines
284+
285+
### Before Committing
286+
1. Run `make format` - Format code with Black
287+
2. Run `make lint` - Check code quality with flake8
288+
3. Run `make test` - Execute all tests
289+
4. Update documentation if needed
290+
5. Add/update tests for new functionality
291+
292+
### Commit Messages
293+
- Use conventional commit format
294+
- Be descriptive and concise
295+
- Reference issues when applicable
296+
- Example: `feat: add new technical indicator for momentum`
297+
298+
### Pull Request Requirements
299+
- All tests must pass
300+
- Code coverage > 80%
301+
- Documentation updated
302+
- No linting errors
303+
- Clear description of changes
304+
305+
## Emergency Procedures
306+
307+
### Breaking Changes
308+
- Maintain backward compatibility when possible
309+
- Use deprecation warnings for removed features
310+
- Update documentation immediately
311+
- Notify team of breaking changes
312+
313+
### Critical Bugs
314+
- Create hotfix branch immediately
315+
- Add regression tests
316+
- Deploy fix as soon as possible
317+
- Document the issue and solution
318+
319+
---
320+
321+
**Remember**: Always test thoroughly, follow coding standards, and maintain documentation. This codebase is used for financial applications - accuracy and reliability are paramount.

0 commit comments

Comments
 (0)