Export file attachments (for example PDF, Word, HTML, CSV, images) stored in a Zotero library to Markdown using Docling, driven entirely through the official Zotero Web API via PyZotero.
- Uses Zotero Web API (no direct database access required)
- Authenticates with a Zotero API key (user or group libraries supported)
- Discovers imported file attachments with optional collection/tag filters (collection keys)
- Downloads eligible attachments (imported files only) and converts them to Markdown via Docling
- Supports Multi-GPU acceleration for document conversion (automatic distribution across available GPUs)
- Configurable Docling pipeline (OCR, picture description, image resolution)
- Default machine-safe per-page header/body/footer section markers in Markdown output
- Organises exported Markdown by reference folders named from citation key (default) or item title
- Supports dry-run mode, overwrite behaviour, chunk-size tuning
- Provides both a CLI and a Python API for programmatic usage
- Python 3.11+
- A Zotero Web API key with at least read access to the target library
- The Zotero library ID (numeric)
- Network access to
https://api.zotero.org
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install .
# or for development
pip install -e .[dev]You can install the package without cloning:
pip install git+https://github.com/ma-ji/zotero-files2md.git#egg=zotero-files2mdTo upgrade the package to the latest version from GitHub:
pip install --upgrade git+https://github.com/ma-ji/zotero-files2md.git#egg=zotero-files2mdTo force reinstall from GitHub source:
Only reinstall zotero-files2md itself (without reinstalling dependencies):
pip install --upgrade --force-reinstall --no-deps git+https://github.com/ma-ji/zotero-files2md.git#egg=zotero-files2mdReinstall zotero-files2md and all dependencies:
pip install --upgrade --force-reinstall git+https://github.com/ma-ji/zotero-files2md.git#egg=zotero-files2mdTo install a specific tag or branch, append @ref, for example:
pip install git+https://github.com/ma-ji/zotero-files2md.git@v0.1.0#egg=zotero-files2md
pip install git+https://github.com/ma-ji/zotero-files2md.git@main#egg=zotero-files2mdYou can also install extras (for example, development dependencies) with:
pip install "git+https://github.com/ma-ji/zotero-files2md.git#egg=zotero-files2md[dev]"zotero-files2md export \
./markdown-output \
--api-key "$ZOTERO_API_KEY" \
--library-id 123456 \
--library-type user \
--collection ABCD1234 \
--tag "LLM" \
--limit 20 \
--chunk-size 50 \
--max-workers 8 \
--overwrite \
--force-full-page-ocr \
--do-picture-description \
--image-resolution-scale 4.0 \
--image-processing embed \
--no-page-sections \
--reference-folder-name citation-key \
--use-multi-gpu \
--log-level debugProvide one or more --collection-output COLLECTION_KEY=OUTPUT_DIR entries:
zotero-files2md export-batch \
--collection-output ABCD1234=./markdown-output/collection-a \
--collection-output EFGH5678=./markdown-output/collection-b \
--api-key "$ZOTERO_API_KEY" \
--library-id 123456 \
--library-type user \
--tag "LLM" \
--chunk-size 50 \
--max-workers 8 \
--overwrite \
--log-level info| Argument | Description |
|---|---|
output_dir |
Directory where Markdown files will be written. Created if missing. |
| Option | Description | Default |
|---|---|---|
--api-key |
Zotero Web API key (prompted if not provided; honours ZOTERO_API_KEY). |
- |
--library-id |
Target Zotero library ID (numeric; honours ZOTERO_LIBRARY_ID). |
- |
--library-type |
Library type (user or group). |
user |
--collection/-c KEY |
Filter attachments by collection key (repeatable; obtain keys via the Zotero web UI or API). | - |
--collection-output/-C KEY=DIR |
Batch mode: export one collection key to a specific output directory (repeatable; use with export-batch). |
- |
--tag/-t NAME |
Filter attachments by tag name (repeatable). | - |
--limit N |
Stop after processing N attachments. |
None |
--chunk-size N |
Number of attachments to request per API call. | 100 |
--max-workers N |
Upper bound on parallel download/conversion workers (auto-detected if unset; in multi-GPU mode, total workers are capped by GPU_count * --workers-per-gpu). |
Auto (up to 12) |
--workers-per-gpu N |
Maximum worker processes per GPU in multi-GPU mode (lower to reduce OOM risk). | 1 |
--overwrite |
Overwrite existing Markdown files instead of skipping. | False |
--dry-run |
List target files without downloading attachments or writing Markdown. | False |
--force-full-page-ocr |
Force full-page OCR for better quality (slower). | False |
--do-picture-description |
Enable GenAI picture description (slower). | False |
--image-resolution-scale N |
Image resolution scale for Docling. | 4.0 |
--image-processing MODE |
How to handle images in Markdown output (embed, placeholder, drop). |
embed |
--page-sections / --no-page-sections |
Include per-page machine-safe header/body/footer section markers in Markdown output. | True |
--reference-folder-name MODE |
How to name each reference folder (citation-key or item-title). |
citation-key |
--use-multi-gpu / --no-use-multi-gpu |
Distribute processing across available GPUs. | True |
--log-level LEVEL |
Logging verbosity (critical, error, warning, info, debug). |
info |
For each processed attachment:
output_dir/
└── <reference-folder-slug>/
└── <attachment-title-slug>.md
Default (--reference-folder-name citation-key) example:
/exports/
└── smith2023foundations/
└── appendix-a-methods.md
Alternative (--reference-folder-name item-title) example:
/exports/
└── smith-2023-foundations/
└── appendix-a-methods.md
When --page-sections is enabled (default), each page includes explicit header/body/footer
section delimiters without Markdown heading syntax:
[[[PAGE:1|HEADER|START]]]
... header text or [[[PAGE:1|HEADER|EMPTY]]]
[[[PAGE:1|HEADER|END]]]
[[[PAGE:1|BODY|START]]]
... body text or [[[PAGE:1|BODY|EMPTY]]]
[[[PAGE:1|BODY|END]]]
[[[PAGE:1|FOOTER|START]]]
... footer text or [[[PAGE:1|FOOTER|EMPTY]]]
[[[PAGE:1|FOOTER|END]]]
from pathlib import Path
from zotero_files2md import export_collections, export_library
from zotero_files2md.settings import ExportSettings
settings = ExportSettings(
api_key="your-api-key",
library_id="123456",
library_type="user",
output_dir=Path("./markdown-output"),
collections={"ABCD1234"},
overwrite=True,
chunk_size=50,
max_workers=8,
use_multi_gpu=True,
force_full_page_ocr=False,
do_picture_description=False,
image_processing="embed",
page_sections=True, # default; set False to disable section markers
reference_folder_name="citation-key",
)
summary = export_library(settings)
print(summary)
# Batch export: multiple collections -> multiple output directories
# Note: ``export_collections`` overrides ``settings.output_dir`` and
# ``settings.collections`` per mapping entry.
batch = export_collections(
settings,
{
"ABCD1234": Path("./markdown-output/collection-a"),
"EFGH5678": Path("./markdown-output/collection-b"),
},
)
print(batch)pip install -e .[dev]
pytest- The Zotero Web API only provides access to attachments that are stored in Zotero (
imported_file/imported_url). Linked file attachments (linked_file) are skipped automatically. - Ensure the API key has sufficient permissions for the target library (read at minimum).
- API rate limits apply; adjust
--chunk-sizeor insert breaks between runs if necessary. - If a conversion triggers a CUDA out-of-memory error, the exporter retries that attachment on CPU.
- When running in
--dry-runmode, attachments are enumerated but files are not downloaded and no Markdown is written. - When one or more
--collectionkeys are supplied, only those collections are queried (via the Zoterocollection_itemsendpoint). Provide collection keys rather than names; you can copy the key from the Zotero web UI URL.