We'd love to accept your patches and contributions to this project. There are just a few small guidelines you need to follow.
Contributions to this project must be accompanied by a Contributor License Agreement (CLA). You (or your employer) retain the copyright to your contribution; this simply gives us permission to use and redistribute your contributions as part of the project. Head over to https://cla.developers.google.com/ to see your current agreements on file or to sign a new one.
You generally only need to submit a CLA once, so if you've already submitted one (even if it was for a different project), you probably don't need to do it again.
All submissions, including submissions by project members, require review.
We use GitHub pull requests for this purpose.
Consult GitHub Help for more
information on using pull requests.
We have an AI-powered workflow to help streamline the contribution process. You can either use the automated scripts locally or let the GitHub Actions workflow handle the heavy lifting for you.
- Create your notebook using PySpark or BigQuery DataFrames.
- Focus on the logic and the value of your recipe. You can follow the general structure of existing notebooks (Overview, Setup, EDA, Modeling, etc.), but don't worry if you forget to add the license header or platform links—the AI will handle it.
- Place your finished notebook file in the appropriate
notebooks/<category>/<sub_category>/folder. The folder structure is crucial as it helps the AI determine the correct metadata.
This is the easiest way to ensure your contribution is compliant and well-documented.
- Set up your API Key: Make sure you have a Gemini API key and have set it as an environment variable:
export GEMINI_API_KEY="YOUR_API_KEY"
- Stage your new notebook: Use
gitto add your new notebook file to the staging area.git add notebooks/<category>/<sub_category>/your_new_notebook.ipynb
- Run the Enhancement Script: This script will standardize your notebook by adding the license, platform links, and improving documentation where needed.
python .ci/scripts/enhance_notebook.py
- Run the Documentation Script: This script will generate the metadata for your notebook and automatically update the
.ci/index.jsonandREADME.mdfiles.python .ci/scripts/generate_docs.py
- Commit all changes: Add all the files that were created or modified by the scripts and commit them.
git add . git commit -m "feat: Add new notebook for X"
- Create a Pull Request from your
feat/<new_notebook>branch to themainbranch. - The PR title should follow the Conventional Commits specification (e.g.,
feat: Add housing price prediction notebook).
When you create a pull request, our GitHub Actions workflow will automatically perform the following steps:
- Validation: It first runs the
.ci/scripts/validate_entries.pyscript to validateindex.jsonandsamples.json. - AI-Powered Autofix (If Validation Fails): If the validation fails (e.g., you forgot to run the scripts locally), the
autofix-docsjob will be triggered. This job:- Runs the enhancement script to standardize your notebook.
- Runs the documentation script to generate and add the required metadata to
index.jsonandREADME.md. - Commits these changes directly to your pull request branch. The validation will then re-run and should pass.
If you prefer to do things manually, you will need to:
- Add the Apache 2.0 license and the platform links table to your notebook.
- Add a new entry for your notebook to the
.ci/index.jsonfile. - Add a new row for your notebook to the table in the
README.mdfile. - Run
.ci/scripts/validate_entries.pyto confirm yourindex.jsonchanges are valid.
| Type | Title | Description |
|---|---|---|
| feat | Features | A new feature or notebook. |
| fix | Bug Fixes | A bug fix. |
| docs | Documentation | Changes to documentation only (e.g., README, API docs). |
| style | Styles | Code style changes that do not affect the meaning of the code (e.g., white-space, formatting, missing semi-colons). |
| refactor | Code Refactoring | A code change that neither fixes a bug nor adds a feature. |
| perf | Performance | A code change that improves performance. |
| test | Tests | Adding missing tests or correcting existing tests. |
| chore | Chores | Changes to the build process, auxiliary tools, or configurations that don't relate to production code. |
| build | Build System | Changes that affect the build system or external dependencies. |
| ci | Continuous Integration | Changes to CI configuration files and scripts (e.g., GitHub Actions). |
| revert | Reverts | Reverts a previous commit. |
This project follows Google's Open Source Community Guidelines.