telegramify-markdown

**Effortlessly convert raw Markdown to Telegram plain text

MessageEntity pairs.**

Say goodbye to MarkdownV2 escaping headaches! This library parses Markdown (including LLM output, GitHub READMEs, etc.) and produces (text, entities) tuples that can be sent directly via the Telegram Bot API — no parse_mode needed.

No matter the format or length, it can be easily handled!
Entity offsets are measured in UTF-16 code units, exactly as Telegram requires.
We also support LaTeX-to-Unicode conversion, expandable block quotes, and Mermaid diagram rendering.
Built on pyromark (Rust pulldown-cmark bindings) for speed and correctness.

Note

v1.0.0 is a breaking change from 0.x. The output is now (str, list[MessageEntity]) instead of a MarkdownV2 string. The old markdownify() and standardize() functions have been removed.

Currently in release candidate. Install with pip install telegramify-markdown --pre to try it. The default pip install telegramify-markdown (without --pre) still installs the stable 0.5.x version.

👀 Use case

convert()	convert()	telegramify()

🪄 Quick Start

Install

Requires Python 3.10+. Currently in release candidate — use the pre-release flag for your package manager.

# uv (recommended)
uv add telegramify-markdown --prerelease=allow
uv add "telegramify-markdown[mermaid]" --prerelease=allow

# pip
pip install telegramify-markdown --pre
pip install "telegramify-markdown[mermaid]" --pre

# PDM
pdm add telegramify-markdown --prerelease
pdm add "telegramify-markdown[mermaid]" --prerelease

# Poetry
poetry add telegramify-markdown --allow-prereleases
poetry add "telegramify-markdown[mermaid]" --allow-prereleases

🤔 What you want to do?

If you just want to send static text and don't want to worry about formatting → use convert()
If you are developing an LLM application or need to send potentially super-long text → use telegramify()
If you need to split convert() output manually → use split_entities()

`convert()` — single message

from telebot import TeleBot
from telegramify_markdown import convert

bot = TeleBot("YOUR_TOKEN")

md = "**Bold**, _italic_, and `code`."
text, entities = convert(md)

bot.send_message(
    chat_id,
    text,
    entities=[e.to_dict() for e in entities],
)

No parse_mode parameter — Telegram reads the entities directly.

`telegramify()` — long messages, code files, diagrams

For LLM output or long documents, telegramify() splits text, extracts code blocks as files, and renders Mermaid diagrams as images:

import asyncio
from telebot import TeleBot
from telegramify_markdown import telegramify
from telegramify_markdown.content import ContentType

bot = TeleBot("YOUR_TOKEN")

md = """
# Report

Here is some analysis with **bold** and _italic_ text.

```python
print("hello world")
```

And a diagram:

```mermaid
graph TD
    A-->B
```
"""

async def send():
    results = await telegramify(md, max_message_length=4090)
    for item in results:
        if item.content_type == ContentType.TEXT:
            bot.send_message(
                chat_id,
                item.text,
                entities=[e.to_dict() for e in item.entities],
            )
        elif item.content_type == ContentType.PHOTO:
            bot.send_photo(
                chat_id,
                (item.file_name, item.file_data),
                caption=item.caption_text or None,
                caption_entities=[e.to_dict() for e in item.caption_entities] or None,
            )
        elif item.content_type == ContentType.FILE:
            bot.send_document(
                chat_id,
                (item.file_name, item.file_data),
                caption=item.caption_text or None,
                caption_entities=[e.to_dict() for e in item.caption_entities] or None,
            )

asyncio.run(send())

`split_entities()` — manual splitting

If you use convert() but need to split long output yourself:

from telegramify_markdown import convert, split_entities

text, entities = convert(long_markdown)

for chunk_text, chunk_entities in split_entities(text, entities, max_utf16_len=4096):
    bot.send_message(
        chat_id,
        chunk_text,
        entities=[e.to_dict() for e in chunk_entities],
    )

⚙️ Configuration

Customize heading symbols, link symbols, and expandable citation behavior:

from telegramify_markdown.config import get_runtime_config

cfg = get_runtime_config()
cfg.markdown_symbol.heading_level_1 = "📌"
cfg.markdown_symbol.link = "🔗"
cfg.cite_expandable = True  # Long quotes become expandable_blockquote

# For clean output without emoji heading prefixes:
# cfg.markdown_symbol.heading_level_1 = ""
# cfg.markdown_symbol.heading_level_2 = ""
# cfg.markdown_symbol.heading_level_3 = ""
# cfg.markdown_symbol.heading_level_4 = ""

📖 API Reference

`convert(markdown, *, latex_escape=True) -> tuple[str, list[MessageEntity]]`

Synchronous. Converts a Markdown string to plain text and a list of MessageEntity objects.

Parameter	Type	Default	Description
`markdown`	`str`	required	Raw Markdown text
`latex_escape`	`bool`	`True`	Convert LaTeX `\(...\)` and `\[...\]` to Unicode symbols

Returns (text, entities) where text is plain text and entities is a list of MessageEntity.

`telegramify(content, *, max_message_length=4096, latex_escape=True) -> list[Text | File | Photo]`

Async. Full pipeline: converts Markdown, splits long messages, extracts code blocks as files, renders Mermaid diagrams as images.

Parameter	Type	Default	Description
`content`	`str`	required	Raw Markdown text
`max_message_length`	`int`	`4096`	Max UTF-16 code units per text message
`latex_escape`	`bool`	`True`	Convert LaTeX to Unicode

Returns an ordered list of Text, File, or Photo objects.

`split_entities(text, entities, max_utf16_len) -> list[tuple[str, list[MessageEntity]]]`

Split text + entities into chunks within a UTF-16 length limit. Splits at newline boundaries; entities spanning a split point are clipped into both chunks.

`MessageEntity`

@dataclasses.dataclass(slots=True)
class MessageEntity:
    type: str           # "bold", "italic", "code", "pre", "text_link", etc.
    offset: int         # Start position in UTF-16 code units
    length: int         # Length in UTF-16 code units
    url: str | None     # For "text_link" entities
    language: str | None       # For "pre" entities (code block language)
    custom_emoji_id: str | None  # For "custom_emoji" entities

    def to_dict(self) -> dict: ...

Content Types

Class	Fields	Description
`Text`	`text`, `entities`, `content_trace`	A text message segment
`File`	`file_name`, `file_data`, `caption_text`, `caption_entities`, `content_trace`	An extracted code block
`Photo`	`file_name`, `file_data`, `caption_text`, `caption_entities`, `content_trace`	A rendered Mermaid diagram

`utf16_len(text) -> int`

Returns the length of a string in UTF-16 code units (what Telegram uses for offsets).

🔨 Supported Markdown Features

🤖 For AI Coding Assistants

Copy this block into your AI assistant's context (e.g. CLAUDE.md, Cursor Rules, etc.) to get accurate code generation for telegramify-markdown:

Click to expand context block

# telegramify-markdown integration guide

## Install
uv add telegramify-markdown --prerelease=allow  # or: pip install telegramify-markdown --pre

## API (v1.0.0+) — outputs plain text + MessageEntity, NOT MarkdownV2 strings

### convert() — sync, single message
from telegramify_markdown import convert
text, entities = convert("**bold** and _italic_")
bot.send_message(chat_id, text, entities=[e.to_dict() for e in entities])
# Do NOT set parse_mode — entities replace it entirely.

### telegramify() — async, auto-splits long text, extracts code blocks as files
from telegramify_markdown import telegramify
from telegramify_markdown.content import ContentType
results = await telegramify(md, max_message_length=4090)
for item in results:
    if item.content_type == ContentType.TEXT:
        bot.send_message(chat_id, item.text, entities=[e.to_dict() for e in item.entities])
    elif item.content_type == ContentType.FILE:
        bot.send_document(chat_id, (item.file_name, item.file_data))
    elif item.content_type == ContentType.PHOTO:
        bot.send_photo(chat_id, (item.file_name, item.file_data))

### split_entities() — manual splitting for convert() output
from telegramify_markdown import convert, split_entities
text, entities = convert(long_md)
for chunk_text, chunk_entities in split_entities(text, entities, max_utf16_len=4096):
    bot.send_message(chat_id, chunk_text, entities=[e.to_dict() for e in chunk_entities])

### Configuration
from telegramify_markdown.config import get_runtime_config
cfg = get_runtime_config()
cfg.markdown_symbol.heading_level_1 = "📌"
cfg.cite_expandable = True

## Critical rules
- entities must be passed as list[dict] via [e.to_dict() for e in entities], NEVER as JSON string
- NEVER set parse_mode when sending with entities — they are mutually exclusive
- All entity offsets are UTF-16 code units. Use utf16_len() to measure text length.
- Requires Python 3.10+

🧸 Acknowledgement

This library is inspired by npm:telegramify-markdown.

LaTeX escape is inspired by latex2unicode and @yym68686.

📜 License

This project is licensed under the MIT License — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 256 Commits
.github		.github
feature-test		feature-test
playground		playground
src/telegramify_markdown		src/telegramify_markdown
tests		tests
.gitignore		.gitignore
.nerve.toml		.nerve.toml
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
llms-full.txt		llms-full.txt
llms.txt		llms.txt
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

telegramify-markdown

👀 Use case

🪄 Quick Start

Install

🤔 What you want to do?

`convert()` — single message

`telegramify()` — long messages, code files, diagrams

`split_entities()` — manual splitting

⚙️ Configuration

📖 API Reference

`convert(markdown, *, latex_escape=True) -> tuple[str, list[MessageEntity]]`

`telegramify(content, *, max_message_length=4096, latex_escape=True) -> list[Text | File | Photo]`

`split_entities(text, entities, max_utf16_len) -> list[tuple[str, list[MessageEntity]]]`

`MessageEntity`

Content Types

`utf16_len(text) -> int`

🔨 Supported Markdown Features

🤖 For AI Coding Assistants

🧸 Acknowledgement

📜 License

About

Uh oh!

Releases 37

Uh oh!

Contributors 8

Languages

License

sudoskys/telegramify-markdown

Folders and files

Latest commit

History

Repository files navigation

telegramify-markdown

👀 Use case

🪄 Quick Start

Install

🤔 What you want to do?

convert() — single message

telegramify() — long messages, code files, diagrams

split_entities() — manual splitting

⚙️ Configuration

📖 API Reference

convert(markdown, *, latex_escape=True) -> tuple[str, list[MessageEntity]]

telegramify(content, *, max_message_length=4096, latex_escape=True) -> list[Text | File | Photo]

split_entities(text, entities, max_utf16_len) -> list[tuple[str, list[MessageEntity]]]

MessageEntity

Content Types

utf16_len(text) -> int

🔨 Supported Markdown Features

🤖 For AI Coding Assistants

🧸 Acknowledgement

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 37

Uh oh!

Contributors 8

Languages

`convert()` — single message

`telegramify()` — long messages, code files, diagrams

`split_entities()` — manual splitting

`convert(markdown, *, latex_escape=True) -> tuple[str, list[MessageEntity]]`

`telegramify(content, *, max_message_length=4096, latex_escape=True) -> list[Text | File | Photo]`

`split_entities(text, entities, max_utf16_len) -> list[tuple[str, list[MessageEntity]]]`

`MessageEntity`

`utf16_len(text) -> int`