zotero-files2md

Export file attachments (for example PDF, Word, HTML, CSV, images) stored in a Zotero library to Markdown using Docling, driven entirely through the official Zotero Web API via PyZotero.

Features

Uses Zotero Web API (no direct database access required)
Authenticates with a Zotero API key (user or group libraries supported)
Discovers imported file attachments with optional collection/tag filters (collection keys)
Downloads eligible attachments (imported files only) and converts them to Markdown via Docling
Supports Multi-GPU acceleration for document conversion (automatic distribution across available GPUs)
Configurable Docling pipeline (OCR, picture description, image resolution)
Default machine-safe per-page header/body/footer section markers in Markdown output
Organises exported Markdown by reference folders named from citation key (default) or item title
Supports dry-run mode, overwrite behaviour, chunk-size tuning
Provides both a CLI and a Python API for programmatic usage

Requirements

Python 3.11+
A Zotero Web API key with at least read access to the target library
The Zotero library ID (numeric)
Network access to https://api.zotero.org

Installation

python -m venv .venv
source .venv/bin/activate

pip install --upgrade pip
pip install .
# or for development
pip install -e .[dev]

Install directly from GitHub

You can install the package without cloning:

pip install git+https://github.com/ma-ji/zotero-files2md.git#egg=zotero-files2md

To upgrade the package to the latest version from GitHub:

pip install --upgrade git+https://github.com/ma-ji/zotero-files2md.git#egg=zotero-files2md

To force reinstall from GitHub source:

Only reinstall zotero-files2md itself (without reinstalling dependencies):

pip install --upgrade --force-reinstall --no-deps git+https://github.com/ma-ji/zotero-files2md.git#egg=zotero-files2md

Reinstall zotero-files2md and all dependencies:

pip install --upgrade --force-reinstall git+https://github.com/ma-ji/zotero-files2md.git#egg=zotero-files2md

To install a specific tag or branch, append @ref, for example:

pip install git+https://github.com/ma-ji/zotero-files2md.git@v0.1.0#egg=zotero-files2md
pip install git+https://github.com/ma-ji/zotero-files2md.git@main#egg=zotero-files2md

You can also install extras (for example, development dependencies) with:

pip install "git+https://github.com/ma-ji/zotero-files2md.git#egg=zotero-files2md[dev]"

CLI Usage

Single output directory

zotero-files2md export \
    ./markdown-output \
    --api-key "$ZOTERO_API_KEY" \
    --library-id 123456 \
    --library-type user \
    --collection ABCD1234 \
    --tag "LLM" \
    --limit 20 \
    --chunk-size 50 \
    --max-workers 8 \
    --overwrite \
    --force-full-page-ocr \
    --do-picture-description \
    --image-resolution-scale 4.0 \
    --image-processing embed \
    --no-page-sections \
    --reference-folder-name citation-key \
    --use-multi-gpu \
    --log-level debug

Batch: multiple collections to multiple output directories

Provide one or more --collection-output COLLECTION_KEY=OUTPUT_DIR entries:

zotero-files2md export-batch \
    --collection-output ABCD1234=./markdown-output/collection-a \
    --collection-output EFGH5678=./markdown-output/collection-b \
    --api-key "$ZOTERO_API_KEY" \
    --library-id 123456 \
    --library-type user \
    --tag "LLM" \
    --chunk-size 50 \
    --max-workers 8 \
    --overwrite \
    --log-level info

Arguments

Argument	Description
`output_dir`	Directory where Markdown files will be written. Created if missing.

Options

Option	Description	Default
`--api-key`	Zotero Web API key (prompted if not provided; honours `ZOTERO_API_KEY`).	-
`--library-id`	Target Zotero library ID (numeric; honours `ZOTERO_LIBRARY_ID`).	-
`--library-type`	Library type (`user` or `group`).	`user`
`--collection/-c KEY`	Filter attachments by collection key (repeatable; obtain keys via the Zotero web UI or API).	-
`--collection-output/-C KEY=DIR`	Batch mode: export one collection key to a specific output directory (repeatable; use with `export-batch`).	-
`--tag/-t NAME`	Filter attachments by tag name (repeatable).	-
`--limit N`	Stop after processing `N` attachments.	None
`--chunk-size N`	Number of attachments to request per API call.	100
`--max-workers N`	Upper bound on parallel download/conversion workers (auto-detected if unset; in multi-GPU mode, total workers are capped by `GPU_count * --workers-per-gpu`).	Auto (up to 12)
`--workers-per-gpu N`	Maximum worker processes per GPU in multi-GPU mode (lower to reduce OOM risk).	1
`--overwrite`	Overwrite existing Markdown files instead of skipping.	False
`--dry-run`	List target files without downloading attachments or writing Markdown.	False
`--force-full-page-ocr`	Force full-page OCR for better quality (slower).	False
`--do-picture-description`	Enable GenAI picture description (slower).	False
`--image-resolution-scale N`	Image resolution scale for Docling.	4.0
`--image-processing MODE`	How to handle images in Markdown output (`embed`, `placeholder`, `drop`).	`embed`
`--page-sections` / `--no-page-sections`	Include per-page machine-safe header/body/footer section markers in Markdown output.	True
`--reference-folder-name MODE`	How to name each reference folder (`citation-key` or `item-title`).	`citation-key`
`--use-multi-gpu` / `--no-use-multi-gpu`	Distribute processing across available GPUs.	True
`--log-level LEVEL`	Logging verbosity (`critical`, `error`, `warning`, `info`, `debug`).	`info`

Markdown Output Layout

For each processed attachment:

output_dir/
└── <reference-folder-slug>/
    └── <attachment-title-slug>.md

Default (--reference-folder-name citation-key) example:

/exports/
└── smith2023foundations/
    └── appendix-a-methods.md

Alternative (--reference-folder-name item-title) example:

/exports/
└── smith-2023-foundations/
    └── appendix-a-methods.md

When --page-sections is enabled (default), each page includes explicit header/body/footer section delimiters without Markdown heading syntax:

[[[PAGE:1|HEADER|START]]]
... header text or [[[PAGE:1|HEADER|EMPTY]]]
[[[PAGE:1|HEADER|END]]]

[[[PAGE:1|BODY|START]]]
... body text or [[[PAGE:1|BODY|EMPTY]]]
[[[PAGE:1|BODY|END]]]

[[[PAGE:1|FOOTER|START]]]
... footer text or [[[PAGE:1|FOOTER|EMPTY]]]
[[[PAGE:1|FOOTER|END]]]

Programmatic Usage

from pathlib import Path
from zotero_files2md import export_collections, export_library
from zotero_files2md.settings import ExportSettings

settings = ExportSettings(
    api_key="your-api-key",
    library_id="123456",
    library_type="user",
    output_dir=Path("./markdown-output"),
    collections={"ABCD1234"},
    overwrite=True,
    chunk_size=50,
    max_workers=8,
    use_multi_gpu=True,
    force_full_page_ocr=False,
    do_picture_description=False,
    image_processing="embed",
    page_sections=True,  # default; set False to disable section markers
    reference_folder_name="citation-key",
)

summary = export_library(settings)
print(summary)

# Batch export: multiple collections -> multiple output directories
# Note: ``export_collections`` overrides ``settings.output_dir`` and
# ``settings.collections`` per mapping entry.
batch = export_collections(
    settings,
    {
        "ABCD1234": Path("./markdown-output/collection-a"),
        "EFGH5678": Path("./markdown-output/collection-b"),
    },
)
print(batch)

Development

pip install -e .[dev]
pytest

Notes & Limitations

The Zotero Web API only provides access to attachments that are stored in Zotero (imported_file / imported_url). Linked file attachments (linked_file) are skipped automatically.
Ensure the API key has sufficient permissions for the target library (read at minimum).
API rate limits apply; adjust --chunk-size or insert breaks between runs if necessary.
If a conversion triggers a CUDA out-of-memory error, the exporter retries that attachment on CPU.
When running in --dry-run mode, attachments are enumerated but files are not downloaded and no Markdown is written.
When one or more --collection keys are supplied, only those collections are queried (via the Zotero collection_items endpoint). Provide collection keys rather than names; you can copy the key from the Zotero web UI URL.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
src/zotero_files2md		src/zotero_files2md
tests		tests
.gitignore		.gitignore
.vscode-upload.json		.vscode-upload.json
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zotero-files2md

Features

Requirements

Installation

Install directly from GitHub

CLI Usage

Single output directory

Batch: multiple collections to multiple output directories

Arguments

Options

Markdown Output Layout

Programmatic Usage

Development

Notes & Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

zotero-files2md

Features

Requirements

Installation

Install directly from GitHub

CLI Usage

Single output directory

Batch: multiple collections to multiple output directories

Arguments

Options

Markdown Output Layout

Programmatic Usage

Development

Notes & Limitations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages