Releases · sethupavan12/Markdownify

20 Dec 12:17

Latest

What's Changed

LLM calls now automatically retry on failure with configurable attempts and delays.

markdownify input.pdf -o out.md --max-retries 5 --retry-delay 2.0

convert("input.pdf", "out.md", max_retries=5, retry_delay=2.0)

Prevent hitting provider rate limits with configurable requests-per-minute throttling.

markdownify input.pdf -o out.md --rate-limit 60

convert("input.pdf", "out.md", rate_limit_rpm=60)

Cache LLM responses to avoid redundant API calls and costs. Re-running on the same input is instant.

markdownify input.pdf -o out.md --cache
markdownify input.pdf -o out.md --cache --cache-dir /path/to/cache

convert("input.pdf", "out.md", enable_cache=True, cache_dir="/path/to/cache")

Control log verbosity for debugging or cleaner output.

markdownify input.pdf -o out.md -v    # verbose (debug level)
markdownify input.pdf -o out.md -q    # quiet (warnings only)

convert("input.pdf", "out.md", log_level="verbose")  # or "quiet", "normal", "debug"

Added 37 new tests covering:

Added cache.py module for file-based response caching
Added retry decorator with exponential backoff to LLM calls
Added token bucket rate limiter for API throttling
Extended MarkdownifyConfig with 6 new fields
Extended CLI with --max-retries, --retry-delay, --rate-limit, --cache, --cache-dir, -v, -q flags
Extended Python API with matching parameters
Updated logging.py to support configurable log levels

Full Changelog: v0.3.0...v0.4.0

Assets 2

13 Aug 22:07

Speed/Latency improvements

🚀 8x Faster Page Checks: Adjacent page checks are now parallel which made checks upto 8x faster. The concurrency of grouping can be now controlled with a new option --grouping_concurrency along with normal concurrency for generation for much higher customisation to avoid rate-limits
🚀 8x smaller image footprint: Downscaled pages for continuation checks as these checks are only visual integrity checks as opposed to observing text so we can benefit from a bit of downscaling. Pages are ~3-8x smaller than before and upto ~70% faster both in per-call latency and grouping phase
🔍 Improved Multi-page Table detection. To ensure table consitency across pages, which was the most requested feature, we now support this in the base profile. Should result in imrprovement table merges across pages.
🔍 Cache improvements and stability improvements

Full Changelog: v0.2.1...v0.3.0

Assets 2