habr2md is a simple CLI tool to download and convert articles from Habr — a popular Russian tech blog and knowledge-sharing platform — into clean Markdown.
It:
- extracts only article content
- removes images, galleries and author blocks
- ignores comments
- saves result as
.mdfile - file name is generated from article title
- Python 3.10+
- pip
Create virtual environment:
python3 -m venv .venv
source .venv/bin/activateInstall dependencies:
pip install requests beautifulsoup4 markdownifyRun:
python habr2md.pyPaste article URL, for example:
https://habr.com/en/companies/postgrespro/articles/988066/
Result will be saved to:
results/<article-title>.md
habr2md/
├── habr2md.py
├── results/
├── README.md
└── .gitignore
- Parser is adapted for the current Habr layout (
article-formatted-body) - Images and galleries are removed
- Output format is Markdown
MIT