Our project welcomes external contributions. If you have an itch, please feel free to scratch it.
For more details on the contributing guidelines head to the Docling Project community repository.
We use Poetry to manage dependencies.
To install Poetry, follow the documentation here: https://python-poetry.org/docs/master/#installing-with-the-official-installer
-
Install Poetry globally on your machine:
curl -sSL https://install.python-poetry.org | python3 -The installation script will print the installation bin folder
POETRY_BINwhich you need in the next steps. -
Make sure Poetry is in your
$PATH:- for
zsh:echo 'export PATH="POETRY_BIN:$PATH"' >> ~/.zshrc
- for
bash:echo 'export PATH="POETRY_BIN:$PATH"' >> ~/.bashrc
- for
-
The official guidelines linked above include useful details on configuring autocomplete for most shell environments, e.g., Bash and Zsh.
To activate the Virtual Environment, run:
poetry shellThis will spawn a shell with the Virtual Environment activated. If the Virtual Environment doesn't exist, Poetry will create one for you. Then, to install dependencies, run:
poetry install(Advanced) Use a Specific Python Version
If you need to work with a specific (older) version of Python, run:
poetry env use $(which python3.8)This creates a Virtual Environment with Python 3.8. For other versions, replace $(which python3.8) with the path to the interpreter (e.g., /usr/bin/python3.8) or use $(which pythonX.Y).
poetry add NAMEWe use the following tools to enforce code style:
- iSort, to sort imports
- Black, to format code
We run a series of checks on the codebase on every commit using pre-commit. To install the hooks, run:
pre-commit installTo run the checks on-demand, run:
pre-commit run --all-filesNote: Checks like Black and isort will "fail" if they modify files. This is because pre-commit doesn't like to see files modified by its hooks. In these cases, git add the modified files and git commit again.
When submitting a new feature or fix, please consider adding a short test for it.
When a change improves the conversion results, multiple reference documents must be regenerated and reviewed.
The reference data can be regenerated with
DOCLING_GEN_TEST_DATA=1 poetry run pytestAll PRs modifying the reference test data require a double review to guarantee we don't miss edge cases.
We use MkDocs to write documentation.
To run the documentation server, run:
mkdocs serveThe server will be available at http://localhost:8000.
Run the following:
mkdocs gh-deploy