Releases: sethupavan12/Markdownify
Releases · sethupavan12/Markdownify
Inbuilt-retries, rate-limit support and input caching!
What's Changed
Release Notes - v0.4.0
New Features
Retry Logic with Exponential Backoff
LLM calls now automatically retry on failure with configurable attempts and delays.
markdownify input.pdf -o out.md --max-retries 5 --retry-delay 2.0convert("input.pdf", "out.md", max_retries=5, retry_delay=2.0)Rate Limiting
Prevent hitting provider rate limits with configurable requests-per-minute throttling.
markdownify input.pdf -o out.md --rate-limit 60convert("input.pdf", "out.md", rate_limit_rpm=60)Response Caching
Cache LLM responses to avoid redundant API calls and costs. Re-running on the same input is instant.
markdownify input.pdf -o out.md --cache
markdownify input.pdf -o out.md --cache --cache-dir /path/to/cacheconvert("input.pdf", "out.md", enable_cache=True, cache_dir="/path/to/cache")Verbose/Quiet Logging
Control log verbosity for debugging or cleaner output.
markdownify input.pdf -o out.md -v # verbose (debug level)
markdownify input.pdf -o out.md -q # quiet (warnings only)convert("input.pdf", "out.md", log_level="verbose") # or "quiet", "normal", "debug"New Dependencies
tenacity>=8.2.0- Retry logic with exponential backoff
Test Coverage
Added 37 new tests covering:
- Cache operations (set/get/clear)
- Rate limiter behavior
- Logging level configuration
- Config validation
- Prompt profile loading
- Page image processing
Full Changelog
- Added
cache.pymodule for file-based response caching - Added retry decorator with exponential backoff to LLM calls
- Added token bucket rate limiter for API throttling
- Extended
MarkdownifyConfigwith 6 new fields - Extended CLI with
--max-retries,--retry-delay,--rate-limit,--cache,--cache-dir,-v,-qflags - Extended Python API with matching parameters
- Updated
logging.pyto support configurable log levels
Full Changelog: v0.3.0...v0.4.0
Speed/Latency improvements
Major Perfomance Optimisations
- 🚀 8x Faster Page Checks: Adjacent page checks are now parallel which made checks upto 8x faster. The concurrency of grouping can be now controlled with a new option
--grouping_concurrencyalong with normalconcurrencyfor generation for much higher customisation to avoid rate-limits - 🚀 8x smaller image footprint: Downscaled pages for continuation checks as these checks are only visual integrity checks as opposed to observing text so we can benefit from a bit of downscaling. Pages are ~3-8x smaller than before and upto ~70% faster both in per-call latency and grouping phase
- 🔍 Improved Multi-page Table detection. To ensure table consitency across pages, which was the most requested feature, we now support this in the base profile. Should result in imrprovement table merges across pages.
- 🔍 Cache improvements and stability improvements
Full Changelog: v0.2.1...v0.3.0
Improvements and Bug fixes
- PyPI push bug
v0.2.0: Supports Images
- Now Supports Images as input. (PNG, JPEG, JPG)
- Added Gallery which shows lot of different kinds of ocr inference tasks
v0.1.1: Bug fixes
Full Changelog: v0.1.0...v0.1.1