|
| 1 | +# Contributing to DataRecipe |
| 2 | + |
| 3 | +## Development Setup |
| 4 | + |
| 5 | +```bash |
| 6 | +# Clone and install |
| 7 | +git clone https://github.com/liuxiaotong/data-recipe.git |
| 8 | +cd data-recipe |
| 9 | +python -m venv .venv |
| 10 | +source .venv/bin/activate |
| 11 | +make dev # Install with all dependencies |
| 12 | + |
| 13 | +# Install pre-commit hooks |
| 14 | +make hooks |
| 15 | +``` |
| 16 | + |
| 17 | +## Development Workflow |
| 18 | + |
| 19 | +```bash |
| 20 | +make lint # Run ruff linter |
| 21 | +make format # Auto-format code |
| 22 | +make test # Run tests (3294+ tests) |
| 23 | +make cov # Run tests with coverage (96%+, minimum 80%) |
| 24 | +make typecheck # Run mypy type checking |
| 25 | +make ci # Run full CI pipeline locally |
| 26 | +``` |
| 27 | + |
| 28 | +## Testing |
| 29 | + |
| 30 | +- Use `unittest.TestCase` style |
| 31 | +- Mock all external dependencies (LLM APIs, network requests, file I/O) |
| 32 | +- Place tests in `tests/test_<module>.py` |
| 33 | +- Aim for 90%+ coverage on new code |
| 34 | + |
| 35 | +```bash |
| 36 | +# Run specific test file |
| 37 | +pytest tests/test_analyzer.py -v |
| 38 | + |
| 39 | +# Run with coverage for a specific module |
| 40 | +pytest tests/ --cov=datarecipe.analyzer --cov-report=term-missing |
| 41 | +``` |
| 42 | + |
| 43 | +## Code Style |
| 44 | + |
| 45 | +- **Formatter**: ruff (line-length 100) |
| 46 | +- **Target**: Python 3.10+ (`X | None` instead of `Optional[X]`) |
| 47 | +- **Imports**: sorted by ruff (`I` rule) |
| 48 | +- Pre-commit hooks enforce formatting on every commit |
| 49 | + |
| 50 | +## Project Structure |
| 51 | + |
| 52 | +``` |
| 53 | +src/datarecipe/ |
| 54 | +├── cli/ # CLI commands (7 modules) |
| 55 | +├── core/ # Deep analysis engine |
| 56 | +├── analyzers/ # Spec + LLM dataset analyzers |
| 57 | +├── generators/ # Document generators (markdown/JSON) |
| 58 | +├── extractors/ # Rubrics + prompt extraction |
| 59 | +├── parsers/ # PDF/Word/image parsing |
| 60 | +├── cost/ # Cost estimation models |
| 61 | +├── knowledge/ # Knowledge base + dataset catalog |
| 62 | +├── sources/ # Data sources (HuggingFace, GitHub, web) |
| 63 | +├── providers/ # Deployment providers |
| 64 | +├── constants.py # Shared constants |
| 65 | +└── schema.py # Data models |
| 66 | +``` |
| 67 | + |
| 68 | +## Commit Messages |
| 69 | + |
| 70 | +Use conventional commit format in Chinese or English: |
| 71 | + |
| 72 | +``` |
| 73 | +feat: 新增功能描述 |
| 74 | +fix: 修复问题描述 |
| 75 | +test: 测试相关 |
| 76 | +docs: 文档更新 |
| 77 | +chore: 构建/工具链 |
| 78 | +refactor: 重构 |
| 79 | +``` |
| 80 | + |
| 81 | +## Releasing |
| 82 | + |
| 83 | +Releases are automated via GitHub Actions. To release: |
| 84 | + |
| 85 | +1. Update version in `pyproject.toml`, `src/datarecipe/__init__.py`, `src/datarecipe/cli/__init__.py` |
| 86 | +2. Update `CHANGELOG.md` |
| 87 | +3. Commit and tag: `git tag -a v0.X.Y -m "v0.X.Y"` |
| 88 | +4. Push: `git push origin main --tags` |
| 89 | +5. GitHub Actions will auto-publish to PyPI and create a GitHub Release |
0 commit comments