Skip to content

Commit 5ec6c5f

Browse files
committed
feat: HTML/Markdown parsers
1 parent 3266b7b commit 5ec6c5f

File tree

5 files changed

+835
-1
lines changed

5 files changed

+835
-1
lines changed

README.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# `@telegraf/entity` [![Deno shield](https://img.shields.io/static/v1?label=Built%20for&message=Deno&style=flat-square&logo=deno&labelColor=000&color=fff)](https://deno.land/x/telegraf_entity) [![Bun shield](https://img.shields.io/static/v1?label=Ready%20for&message=Bun&style=flat-square&logo=bun&labelColor=101115&color=fff)](https://npmjs.com/package/@telegraf/entity) [![NPM version](https://img.shields.io/npm/v/@telegraf/entity?color=e74625&style=flat-square)](https://npmjs.com/package/@telegraf/entity)
22

3-
Convert Telegram entities to HTML or Markdown.
3+
Convert Telegram entities to HTML or Markdown (and back).
44

55
> ⚠️ Before you start using this module, consider using [`copyMessage`](https://core.telegram.org/bots/api#copymessage) instead.
66
>
@@ -82,3 +82,25 @@ import { serialiseWith, escapers } from "@telegraf/entity";
8282
const serialise = serialiseWith(myHTMLSerialiser, escapers.HTML);
8383
serialise(ctx.message);
8484
```
85+
86+
## Parsing HTML and Markdown into entities
87+
88+
We now have a fully Bot API-compliant parser for HTML and Markdown (MarkdownV2 not supported yet). This was ported over from tdlib, so it should parse exactly like Bot API does, and throw the same errors, but natively in JavaScript.
89+
90+
This is quite an advanced usecase, and because it's just as strict as the official API, you will not be able to use this to leniently parse HTML or Markdown (for instance, to massage LLM output into entities for Telegram).
91+
92+
Some usecases enabled by this:
93+
94+
- For Telegram client-like applications, which want to parse exactly like Telegram. It's quite trivial to translate the resulting entities into TL types for use with MTProto, for instance.
95+
- The ability to statically check any piece of HTML or Markdown to ensure it's valid.
96+
- Fold existing HTML/Markdown into entities, to be used with Telegraf's [fmt helpers](https://github.com/feathers-studio/telegraf-docs/blob/master/guide/formatting.md#fmt) which construct entities directly.
97+
98+
```TS
99+
import { parse_html, parse_markdown } from "@telegraf/entity";
100+
101+
const parsed = parse_html(html); // { text: string, entities: MessageEntity[] }
102+
// or
103+
const parsed = parse_markdown(markdown); // { text: string, entities: MessageEntity[] }
104+
```
105+
106+
Thanks to [codinary.org](https://codinary.org) for commissioning this feature.

build_npm.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,9 @@ await build({
1414
outDir: "./.npm",
1515
shims: {},
1616
test: false,
17+
compilerOptions: {
18+
lib: ["es2022"],
19+
},
1720
mappings: {
1821
"https://deno.land/x/telegraf_types@v9.0.0/message.ts": {
1922
name: "@telegraf/types",

mod.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
import * as serialisers from "./serialisers.ts";
22
import * as escapers from "./escapers.ts";
33
import type { Message, TextMessage, Tree, MessageEntity } from "./types.ts";
4+
import { Parser } from "./parser/index.ts";
5+
6+
export const parse_html = Parser.parse_html;
7+
export const parse_markdown = Parser.parse_markdown;
48

59
// https://github.com/tdlib/td/blob/721300bcb4d0f2114505712f4dc6350af1ce1a09/td/telegram/MessageEntity.cpp#L39
610
const TYPE_PRIORITY: Record<MessageEntity["type"], number> = {

0 commit comments

Comments
 (0)