Skip to content

Releases: sethupavan12/Markdownify

Inbuilt-retries, rate-limit support and input caching!

20 Dec 12:17
8d95d94

Choose a tag to compare

What's Changed

Release Notes - v0.4.0

New Features

Retry Logic with Exponential Backoff

LLM calls now automatically retry on failure with configurable attempts and delays.

markdownify input.pdf -o out.md --max-retries 5 --retry-delay 2.0
convert("input.pdf", "out.md", max_retries=5, retry_delay=2.0)

Rate Limiting

Prevent hitting provider rate limits with configurable requests-per-minute throttling.

markdownify input.pdf -o out.md --rate-limit 60
convert("input.pdf", "out.md", rate_limit_rpm=60)

Response Caching

Cache LLM responses to avoid redundant API calls and costs. Re-running on the same input is instant.

markdownify input.pdf -o out.md --cache
markdownify input.pdf -o out.md --cache --cache-dir /path/to/cache
convert("input.pdf", "out.md", enable_cache=True, cache_dir="/path/to/cache")

Verbose/Quiet Logging

Control log verbosity for debugging or cleaner output.

markdownify input.pdf -o out.md -v    # verbose (debug level)
markdownify input.pdf -o out.md -q    # quiet (warnings only)
convert("input.pdf", "out.md", log_level="verbose")  # or "quiet", "normal", "debug"

New Dependencies

  • tenacity>=8.2.0 - Retry logic with exponential backoff

Test Coverage

Added 37 new tests covering:

  • Cache operations (set/get/clear)
  • Rate limiter behavior
  • Logging level configuration
  • Config validation
  • Prompt profile loading
  • Page image processing

Full Changelog

  • Added cache.py module for file-based response caching
  • Added retry decorator with exponential backoff to LLM calls
  • Added token bucket rate limiter for API throttling
  • Extended MarkdownifyConfig with 6 new fields
  • Extended CLI with --max-retries, --retry-delay, --rate-limit, --cache, --cache-dir, -v, -q flags
  • Extended Python API with matching parameters
  • Updated logging.py to support configurable log levels

Full Changelog: v0.3.0...v0.4.0

Speed/Latency improvements

13 Aug 22:07

Choose a tag to compare

Major Perfomance Optimisations

  1. 🚀 8x Faster Page Checks: Adjacent page checks are now parallel which made checks upto 8x faster. The concurrency of grouping can be now controlled with a new option --grouping_concurrency along with normal concurrency for generation for much higher customisation to avoid rate-limits
  2. 🚀 8x smaller image footprint: Downscaled pages for continuation checks as these checks are only visual integrity checks as opposed to observing text so we can benefit from a bit of downscaling. Pages are ~3-8x smaller than before and upto ~70% faster both in per-call latency and grouping phase
  3. 🔍 Improved Multi-page Table detection. To ensure table consitency across pages, which was the most requested feature, we now support this in the base profile. Should result in imrprovement table merges across pages.
  4. 🔍 Cache improvements and stability improvements

Full Changelog: v0.2.1...v0.3.0

Improvements and Bug fixes

09 Aug 14:57

Choose a tag to compare

v0.2.0: Supports Images

09 Aug 14:42
c3a41bb

Choose a tag to compare

  • Now Supports Images as input. (PNG, JPEG, JPG)
  • Added Gallery which shows lot of different kinds of ocr inference tasks

v0.1.1: Bug fixes

09 Aug 10:41

Choose a tag to compare

First release

09 Aug 10:38
e3d0253

Choose a tag to compare