|
| 1 | +# GitHub Copilot Instructions for FLAML |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +FLAML (Fast Library for Automated Machine Learning & Tuning) is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance. |
| 6 | + |
| 7 | +**Key Components:** |
| 8 | + |
| 9 | +- `flaml/automl/`: AutoML functionality for classification and regression |
| 10 | +- `flaml/tune/`: Generic hyperparameter tuning |
| 11 | +- `flaml/default/`: Zero-shot AutoML with default configurations |
| 12 | +- `flaml/autogen/`: Legacy autogen code (note: AutoGen has moved to a separate repository) |
| 13 | +- `flaml/fabric/`: Microsoft Fabric integration |
| 14 | +- `test/`: Comprehensive test suite |
| 15 | + |
| 16 | +## Build and Test Commands |
| 17 | + |
| 18 | +### Installation |
| 19 | + |
| 20 | +```bash |
| 21 | +# Basic installation |
| 22 | +pip install -e . |
| 23 | + |
| 24 | +# Install with test dependencies |
| 25 | +pip install -e .[test] |
| 26 | + |
| 27 | +# Install with automl dependencies |
| 28 | +pip install -e .[automl] |
| 29 | + |
| 30 | +# Install with forecast dependencies (Linux only) |
| 31 | +pip install -e .[forecast] |
| 32 | +``` |
| 33 | + |
| 34 | +### Running Tests |
| 35 | + |
| 36 | +```bash |
| 37 | +# Run all tests (excluding autogen) |
| 38 | +pytest test/ --ignore=test/autogen --reruns 2 --reruns-delay 10 |
| 39 | + |
| 40 | +# Run tests with coverage |
| 41 | +coverage run -a -m pytest test --ignore=test/autogen --reruns 2 --reruns-delay 10 |
| 42 | +coverage xml |
| 43 | + |
| 44 | +# Check dependencies |
| 45 | +python test/check_dependency.py |
| 46 | +``` |
| 47 | + |
| 48 | +### Linting and Formatting |
| 49 | + |
| 50 | +```bash |
| 51 | +# Run pre-commit hooks |
| 52 | +pre-commit run --all-files |
| 53 | + |
| 54 | +# Format with black (line length: 120) |
| 55 | +black . --line-length 120 |
| 56 | + |
| 57 | +# Run ruff for linting and auto-fix |
| 58 | +ruff check . --fix |
| 59 | +``` |
| 60 | + |
| 61 | +## Code Style and Formatting |
| 62 | + |
| 63 | +### Python Style |
| 64 | + |
| 65 | +- **Line length:** 120 characters (configured in both Black and Ruff) |
| 66 | +- **Formatter:** Black (v23.3.0+) |
| 67 | +- **Linter:** Ruff with Pyflakes and pycodestyle rules |
| 68 | +- **Import sorting:** Use isort (via Ruff) |
| 69 | +- **Python version:** Supports Python >= 3.10 (full support for 3.10, 3.11, 3.12; Python 3.13 tested but some optional dependencies may have limited compatibility) |
| 70 | + |
| 71 | +### Code Quality Rules |
| 72 | + |
| 73 | +- Follow Black formatting conventions |
| 74 | +- Keep imports sorted and organized |
| 75 | +- Avoid unused imports (F401) - these are flagged but not auto-fixed |
| 76 | +- Avoid wildcard imports (F403) where possible |
| 77 | +- Complexity: Max McCabe complexity of 10 |
| 78 | +- Use type hints where appropriate |
| 79 | +- Write clear docstrings for public APIs |
| 80 | + |
| 81 | +### Pre-commit Hooks |
| 82 | + |
| 83 | +The repository uses pre-commit hooks for: |
| 84 | + |
| 85 | +- Checking for large files, AST syntax, YAML/TOML/JSON validity |
| 86 | +- Detecting merge conflicts and private keys |
| 87 | +- Trailing whitespace and end-of-file fixes |
| 88 | +- pyupgrade for Python 3.8+ syntax |
| 89 | +- Black formatting |
| 90 | +- Markdown formatting (mdformat with GFM and frontmatter support) |
| 91 | +- Ruff linting with auto-fix |
| 92 | + |
| 93 | +## Testing Strategy |
| 94 | + |
| 95 | +### Test Organization |
| 96 | + |
| 97 | +- Tests are in the `test/` directory, organized by module |
| 98 | +- `test/automl/`: AutoML feature tests |
| 99 | +- `test/tune/`: Hyperparameter tuning tests |
| 100 | +- `test/default/`: Zero-shot AutoML tests |
| 101 | +- `test/nlp/`: NLP-related tests |
| 102 | +- `test/spark/`: Spark integration tests |
| 103 | + |
| 104 | +### Test Requirements |
| 105 | + |
| 106 | +- Write tests for new functionality |
| 107 | +- Ensure tests pass on multiple Python versions (3.10, 3.11, 3.12, 3.13) |
| 108 | +- Tests should work on both Ubuntu and Windows |
| 109 | +- Use pytest markers for platform-specific tests (e.g., `@pytest.mark.spark`) |
| 110 | +- Tests should be idempotent and not depend on external state |
| 111 | +- Use `--reruns 2 --reruns-delay 10` for flaky tests |
| 112 | + |
| 113 | +### Coverage |
| 114 | + |
| 115 | +- Aim for good test coverage on new code |
| 116 | +- Coverage reports are generated for Python 3.11 builds |
| 117 | +- Coverage reports are uploaded to Codecov |
| 118 | + |
| 119 | +## Git Workflow and Best Practices |
| 120 | + |
| 121 | +### Branching |
| 122 | + |
| 123 | +- Main branch: `main` |
| 124 | +- Create feature branches from `main` |
| 125 | +- PR reviews are required before merging |
| 126 | + |
| 127 | +### Commit Messages |
| 128 | + |
| 129 | +- Use clear, descriptive commit messages |
| 130 | +- Reference issue numbers when applicable |
| 131 | + |
| 132 | +### Pull Requests |
| 133 | + |
| 134 | +- Ensure all tests pass before requesting review |
| 135 | +- Update documentation if adding new features |
| 136 | +- Follow the PR template in `.github/PULL_REQUEST_TEMPLATE.md` |
| 137 | + |
| 138 | +## Project Structure |
| 139 | + |
| 140 | +``` |
| 141 | +flaml/ |
| 142 | +├── automl/ # AutoML functionality |
| 143 | +├── tune/ # Hyperparameter tuning |
| 144 | +├── default/ # Zero-shot AutoML |
| 145 | +├── autogen/ # Legacy autogen (deprecated, moved to separate repo) |
| 146 | +├── fabric/ # Microsoft Fabric integration |
| 147 | +├── onlineml/ # Online learning |
| 148 | +└── version.py # Version information |
| 149 | +
|
| 150 | +test/ # Test suite |
| 151 | +├── automl/ |
| 152 | +├── tune/ |
| 153 | +├── default/ |
| 154 | +├── nlp/ |
| 155 | +└── spark/ |
| 156 | +
|
| 157 | +notebook/ # Example notebooks |
| 158 | +website/ # Documentation website |
| 159 | +``` |
| 160 | + |
| 161 | +## Dependencies and Package Management |
| 162 | + |
| 163 | +### Core Dependencies |
| 164 | + |
| 165 | +- NumPy >= 1.17 |
| 166 | +- Python >= 3.10 (officially supported: 3.10, 3.11, 3.12; Python 3.13 is tested in CI but may have limited compatibility with some optional dependencies) |
| 167 | + |
| 168 | +### Optional Dependencies |
| 169 | + |
| 170 | +- `[automl]`: lightgbm, xgboost, scipy, pandas, scikit-learn |
| 171 | +- `[test]`: Full test suite dependencies |
| 172 | +- `[spark]`: PySpark and joblib dependencies |
| 173 | +- `[forecast]`: holidays, prophet, statsmodels, hcrystalball, pytorch-forecasting, pytorch-lightning, tensorboardX |
| 174 | +- `[hf]`: Hugging Face transformers and datasets |
| 175 | +- See `setup.py` for complete list |
| 176 | + |
| 177 | +### Version Constraints |
| 178 | + |
| 179 | +- Be mindful of Python version-specific dependencies (check setup.py) |
| 180 | +- XGBoost versions differ based on Python version |
| 181 | +- NumPy 2.0+ only for Python >= 3.13 |
| 182 | +- Some features (like vowpalwabbit) only work with older Python versions |
| 183 | + |
| 184 | +## Boundaries and Restrictions |
| 185 | + |
| 186 | +### Do NOT Modify |
| 187 | + |
| 188 | +- `.git/` directory and Git configuration |
| 189 | +- `LICENSE` file |
| 190 | +- Version information in `flaml/version.py` (unless explicitly updating version) |
| 191 | +- GitHub Actions workflows without careful consideration |
| 192 | +- Existing test files unless fixing bugs or adding coverage |
| 193 | + |
| 194 | +### Be Cautious With |
| 195 | + |
| 196 | +- `setup.py`: Changes to dependencies should be carefully reviewed |
| 197 | +- `pyproject.toml`: Linting and testing configuration |
| 198 | +- `.pre-commit-config.yaml`: Pre-commit hook configuration |
| 199 | +- Backward compatibility: FLAML is a library with external users |
| 200 | + |
| 201 | +### Security Considerations |
| 202 | + |
| 203 | +- Never commit secrets or API keys |
| 204 | +- Be careful with external data sources in tests |
| 205 | +- Validate user inputs in public APIs |
| 206 | +- Follow secure coding practices for ML operations |
| 207 | + |
| 208 | +## Special Notes |
| 209 | + |
| 210 | +### AutoGen Migration |
| 211 | + |
| 212 | +- AutoGen has moved to a separate repository: https://github.com/microsoft/autogen |
| 213 | +- The `flaml/autogen/` directory contains legacy code |
| 214 | +- Tests in `test/autogen/` are ignored in the main test suite |
| 215 | +- Direct users to the new AutoGen repository for AutoGen-related issues |
| 216 | + |
| 217 | +### Platform-Specific Considerations |
| 218 | + |
| 219 | +- Some tests only run on Linux (e.g., forecast tests with prophet) |
| 220 | +- Windows and Ubuntu are the primary supported platforms |
| 221 | +- macOS support exists but requires special libomp setup for lgbm/xgboost |
| 222 | + |
| 223 | +### Performance |
| 224 | + |
| 225 | +- FLAML focuses on efficient automation and tuning |
| 226 | +- Consider computational cost when adding new features |
| 227 | +- Optimize for low resource usage where possible |
| 228 | + |
| 229 | +## Documentation |
| 230 | + |
| 231 | +- Main documentation: https://microsoft.github.io/FLAML/ |
| 232 | +- Update documentation when adding new features |
| 233 | +- Provide clear examples in docstrings |
| 234 | +- Add notebook examples for significant new features |
| 235 | + |
| 236 | +## Contributing |
| 237 | + |
| 238 | +- Follow the contributing guide: https://microsoft.github.io/FLAML/docs/Contribute |
| 239 | +- Sign the Microsoft CLA when making your first contribution |
| 240 | +- Be respectful and follow the Microsoft Open Source Code of Conduct |
| 241 | +- Join the Discord community for discussions: https://discord.gg/Cppx2vSPVP |
0 commit comments