We would love to accept your patches and contributions to this project.
Contributions to this project must be accompanied by a Contributor License Agreement (CLA). You (or your employer) retain the copyright to your contribution; this simply gives us permission to use and redistribute your contributions as part of the project.
If you or your current employer have already signed the Google CLA (even if it was for a different project), you probably don't need to do it again.
Visit https://cla.developers.google.com/ to see your current agreements or to sign a new one.
This project follows HAI-DEF's Community guidelines
If you encounter a bug or have a feature request, please open an issue on GitHub. We have templates to help guide you:
- Bug Report: For reporting bugs or unexpected behavior
- Feature Request: For suggesting new features or improvements
When creating an issue, GitHub will prompt you to choose the appropriate template. Please provide as much detail as possible to help us understand and address your concern.
To get started, clone the repository and install the necessary dependencies for development and testing. Detailed instructions can be found in the Installation from Source section of the README.md.
Windows Users: The formatting scripts use bash. Please use one of:
- Git Bash (comes with Git for Windows)
- WSL (Windows Subsystem for Linux)
- PowerShell with bash-compatible commands
This project uses automated tools to maintain a consistent code style. Before submitting a pull request, please format your code:
# Run the auto-formatter
./autoformat.shThis script uses:
isortto organize imports with Google style (single-line imports)pyink(Google's fork of Black) to format code according to Google's Python Style Guide
You can also run the formatters manually:
isort langextract tests
pyink langextract tests --config pyproject.tomlNote: The formatters target only langextract and tests directories by default to avoid
formatting virtual environments or other non-source directories.
For automatic formatting checks before each commit:
# Install pre-commit
pip install pre-commit
# Install the git hooks
pre-commit install
# Run manually on all files
pre-commit run --all-filesAll contributions must pass linting checks and unit tests. Please run these locally before submitting your changes:
# Run linting with Pylint 3.x
pylint --rcfile=.pylintrc langextract tests
# Run tests
pytest testsNote on Pylint Configuration: We use a modern, minimal configuration that:
- Only disables truly noisy checks (not entire categories)
- Keeps critical error detection enabled
- Uses plugins for enhanced docstring and type checking
- Aligns with our pyink formatter (80-char lines, 2-space indents)
For full testing across Python versions:
tox # runs pylint + pytest on Python 3.10 and 3.11If you want to add support for a new LLM provider, please refer to the Provider System Documentation. The recommended approach is to create an external plugin package rather than modifying the core library. This allows for:
- Independent versioning and releases
- Faster iteration without core review cycles
- Custom dependencies without affecting core users
All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose.
When you create a pull request, GitHub will automatically populate it with our pull request template. Please fill out all sections of the template to help reviewers understand your changes.
- Keep PRs focused and small: Each PR should address a single issue and contain one cohesive change. PRs are automatically labeled by size to help reviewers:
- size/XS: < 50 lines — Small fixes and documentation updates
- size/S: 50-150 lines — Typical features or bug fixes
- size/M: 150-600 lines — Larger features that remain well-scoped
- size/L: 600-1000 lines — Consider splitting into smaller PRs if possible
- size/XL: > 1000 lines — Requires strong justification and may need special review
- Reference related issues: All PRs must include "Fixes #123" or "Closes #123" in the description. The linked issue should have at least 5 👍 reactions from the community and include discussion that demonstrates the importance and need for the change.
- No infrastructure changes: Contributors cannot modify infrastructure files, build configuration, and core documentation. These files are protected and can only be changed by maintainers. Use
./autoformat.shto format code without affecting infrastructure files. In special circumstances, build configuration updates may be considered if they include discussion and evidence of robust testing, ideally with community support. - Single-change commits: A PR should typically comprise a single git commit. Squash multiple commits before submitting.
- Clear description: Explain what your change does and why it's needed.
- Ensure all tests pass: Check that both formatting and tests are green before requesting review.
- Respond to feedback promptly: Address reviewer comments in a timely manner.
If your change is large or complex, consider:
- Opening an issue first to discuss the approach
- Breaking it into multiple smaller PRs
- Clearly explaining in the PR description why a larger change is necessary
For more details, read HAI-DEF's Contributing guidelines