A tiny, deterministic entity extractor for TypeScript.
Extract structured data, render rich highlights, and redact PII from free-form text — no ML, no network calls.
import { Duckling, PIIParsers } from "@claudiu-ceia/ts-duckling";
// Extract structured entities from a chat message
const entities = Duckling().extract(
"Hey! Meet me at Times Square tomorrow at 3pm. My email is alex@company.io",
);
// → [{ kind: "location", text: "Times Square", ... },
// { kind: "time", text: "tomorrow at 3pm", ... },
// { kind: "email", text: "alex@company.io", ... }]
// Redact PII in one line
Duckling(PIIParsers).redact(
"Contact alex@company.io, SSN 078-05-1120, or call +14155552671",
);
// → "Contact ██████████████████, SSN ███████████, or call ████████████"ts-duckling uses parser combinator grammars (no ML models, no HTTP) to extract structured entities from text. Inspired by Facebook's duckling, but built in pure TypeScript and running anywhere — Deno, Node, or the browser.
- Deterministic — same input always produces the same output
- Typed — parser selection narrows the return type automatically
- Composable — bring your own parsers alongside the built-in ones
- Self-contained — no native dependencies, no runtime downloads, no network calls
- Runs everywhere — Deno, Node.js, and browsers (see the live playground)
import { Duckling } from "jsr:@claudiu-ceia/ts-duckling";Or add to your import map:
deno add jsr:@claudiu-ceia/ts-ducklingnpx jsr add @claudiu-ceia/ts-ducklingCall Duckling() with no arguments to use all 20 built-in parsers:
import { Duckling } from "@claudiu-ceia/ts-duckling";
const msg =
"Hey! I'll be in Germany next Friday at 5pm. Shoot me a message at alex@company.io or visit https://example.com/invite";
for (const e of Duckling().extract(msg)) {
console.log(e.kind, e.text);
}
// location Germany
// time next Friday at 5pm
// email alex@company.io
// url https://example.com/inviteEach entity carries structured data:
// entities[0]
{
kind: "location",
value: { location: "Germany" },
start: 16,
end: 23,
text: "Germany"
}Pass an array of parsers to narrow both what gets extracted and the return type:
import { Duckling, Email, Time, URL } from "@claudiu-ceia/ts-duckling";
const entities = Duckling([Email.parser, URL.parser, Time.parser]).extract(
"Ping me at alex@company.io or https://meet.com — available tomorrow at 2pm",
);
// entities: (EmailEntity | URLEntity | TimeEntity)[]Use .redact() to replace matched entity spans with a mask character:
import { Duckling, PIIParsers } from "@claudiu-ceia/ts-duckling";
// Redact all PII (email, phone, IP, SSN, credit card, UUID, API key, IBAN, MAC, JWT, BIC)
Duckling(PIIParsers).redact(
"Patient email: john.doe@clinic.org, SSN 078-05-1120, phone +14155552671",
);
// → "Patient email: ██████████████████████, SSN ███████████, phone ████████████"
// Custom mask character
Duckling(PIIParsers).redact("Call +14155552671", { mask: "X" });
// → "Call XXXXXXXXXXXX"
// Redact only specific kinds
Duckling(PIIParsers).redact(
"Contact john.doe@clinic.org, SSN 078-05-1120",
{ kinds: ["ssn"] },
);
// → "Contact john.doe@clinic.org, SSN ███████████"Use .render() to replace entity spans via a callback — perfect for turning
plain-text messages into HTML with highlighted or linked entities:
import { Duckling } from "@claudiu-ceia/ts-duckling";
const msg =
"Hey! Meet at Times Square tomorrow at 3pm, email me at alex@company.io or check https://example.com/rsvp";
const html = Duckling().render(msg, ({ entity, children }) => {
switch (entity.kind) {
case "url":
return `<a href="${children}">${children}</a>`;
case "email":
return `<a href="mailto:${children}">${children}</a>`;
default:
return `<mark data-kind="${entity.kind}">${children}</mark>`;
}
});
// → 'Hey! Meet at <mark data-kind="location">Times Square</mark>
// <mark data-kind="time">tomorrow at 3pm</mark>, email me at
// <a href="mailto:alex@company.io">alex@company.io</a> or check
// <a href="https://example.com/rsvp">https://example.com/rsvp</a>'Nested entities (e.g. an SSN containing quantity sub-parts) are rendered inside-out — inner entities are transformed first, and the parent receives the result:
import { Duckling, Quantity, SSN } from "@claudiu-ceia/ts-duckling";
Duckling([Quantity.parser, SSN.parser]).render(
"SSN 123-45-6789",
({ entity, children }) => `<${entity.kind}>${children}</${entity.kind}>`,
);
// → "SSN <ssn><quantity>123</quantity>-<quantity>45</quantity>-<quantity>6789</quantity></ssn>"Return undefined to leave a span unchanged — useful for selective rendering:
import { Duckling } from "@claudiu-ceia/ts-duckling";
// Only make URLs clickable, leave everything else as plain text
Duckling().render(
"Visit https://example.com — event on next Friday at 5pm in Germany",
({ entity, children }) => {
if (entity.kind === "url") return `<a href="${children}">${children}</a>`;
return undefined;
},
);
// → 'Visit <a href="https://example.com">https://example.com</a> — event on next Friday at 5pm in Germany'Use .renderMap() when you need an array of segments instead of a single
string — ideal for React, Preact, Solid, or any framework that renders element
trees:
import { Duckling } from "@claudiu-ceia/ts-duckling";
const msg = "Hey! I'm at Times Square, email me at alex@company.io";
const segments = Duckling().renderMap<JSX.Element>(
msg,
({ entity, children }) => (
<mark key={entity.start} data-kind={entity.kind}>
{children}
</mark>
),
);
// → ["Hey! I'm at ", <mark data-kind="location">Times Square</mark>,
// ", email me at ", <mark data-kind="email">alex@company.io</mark>]
// Drop it straight into a component
function HighlightedMessage({ text }: { text: string }) {
const segments = Duckling().renderMap<JSX.Element>(
text,
({ entity, children }) => {
switch (entity.kind) {
case "url":
return <a href={entity.text}>{children}</a>;
case "email":
return <a href={`mailto:${entity.text}`}>{children}</a>;
case "time":
return <time>{children}</time>;
default:
return <mark data-kind={entity.kind}>{children}</mark>;
}
},
);
return <p>{segments}</p>;
}Like .render(), nested entities are handled automatically — child spans are
mapped first, and the parent callback receives the already-mapped children as
(string | R)[].
Experimental — these methods are stable enough to use but the API may change in a future minor release.
Every method has an async counterpart (extractAsync, renderAsync,
renderMapAsync) that periodically yields to the browser event loop during
scanning. This prevents long inputs from blocking the main thread — no Web
Worker needed:
import { Duckling } from "@claudiu-ceia/ts-duckling";
const controller = new AbortController();
const entities = await Duckling().extractAsync(longText, {
signal: controller.signal, // cancel early if needed
yieldEvery: 256, // yield every N scan positions (default 512)
});Uses
scheduler.yield()
when available (Chrome 129+, Edge 129+, Firefox 142+), falling back to
setTimeout(0).
renderMapAsync is especially useful in UI frameworks — extract and map
entities to React elements without blocking paint:
const segments = await Duckling().renderMapAsync<JSX.Element>(
text,
({ entity, children }) => <mark data-kind={entity.kind}>{children}</mark>,
);
// → ["plain text", <mark>...</mark>, " more text"]Define a parser that returns an Entity, then pass it to Duckling:
import { createLanguage, map, type Parser, regex } from "@claudiu-ceia/combine";
import { Duckling, ent, type Entity } from "@claudiu-ceia/ts-duckling";
type HashtagEntity = Entity<"hashtag", { tag: string }>;
type HashtagLanguage = {
Full: Parser<HashtagEntity>;
parser: Parser<HashtagEntity>;
};
const Hashtag = createLanguage<HashtagLanguage>({
Full: () =>
map(
regex(/#[A-Za-z0-9_]{2,64}/, "hashtag"),
(m, b, a) => ent({ tag: m.slice(1) }, "hashtag", b, a),
),
parser: (s) => s.Full,
});
const entities = Duckling([Hashtag.parser]).extract("hello #duckling");
// → [{ kind: "hashtag", value: { tag: "duckling" }, start: 6, end: 15, text: "#duckling" }]Custom parsers compose freely with the built-in ones:
import { Email } from "@claudiu-ceia/ts-duckling";
const entities = Duckling([Email.parser, Hashtag.parser]).extract(
"Email alex@company.io with #feedback",
);
// entities: (EmailEntity | HashtagEntity)[]| Entity | Kind | Example matches |
|---|---|---|
| Time | time |
tomorrow at 3pm, next Friday, 2024-01-15T10:30:00Z |
| Range | range |
2020-2024, 20°C to 30°C, Monday to Friday |
| Temperature | temperature |
72°F, 20 celsius, -5°C |
| Quantity | quantity |
5 kg, 100 miles, 3,500.00 |
| Location | location |
United States, Germany, Japan |
| URL | url |
https://example.com/path?q=1, docs.example.org |
| Institution | institution |
University of Oxford, New York City Hall |
| Language | language |
English, Japanese, Portuguese |
Available via PIIParsers for targeted redaction with
Duckling(PIIParsers).redact(…).
| Entity | Kind | Example matches |
|---|---|---|
email |
user@example.com, first.last@company.io |
|
| Phone | phone |
+14155552671, +44 20 7123 4567, (415) 555-2671 |
| IP address | ip |
192.168.1.1, 2001:db8::1, ::1 |
| SSN | ssn |
123-45-6789 |
| Credit card | credit_card |
4111 1111 1111 1111, 5500-0000-0000-0004 |
| UUID | uuid |
550e8400-e29b-41d4-a716-446655440000 |
| API key | api_key |
sk-proj-abc123…, ghp_abc123…, AKIA… |
| IBAN | iban |
GB29NWBK60161331926819, DE89 3704 0044 0532 0130 00 |
| MAC address | mac_address |
00:1A:2B:3C:4D:5E, 001A.2B3C.4D5E |
| JWT | jwt |
eyJhbGciOiJIUzI1NiIs… |
| Crypto | crypto_address |
0x742d35Cc…f2bD18, bc1qar0srrr…, 1BvBMSEY… |
| BIC/SWIFT | bic |
DEUTDEFF, BOFAUS3NXXX, COBADEFF100 |
function Duckling(): {
extract;
extractAsync;
render;
renderAsync;
renderMap;
renderMapAsync;
redact;
};
function Duckling<T>(parsers: ParserTuple<T>): {
extract;
extractAsync;
render;
renderAsync;
renderMap;
renderMapAsync;
redact;
};Creates an extractor/renderer/redactor. Without arguments, uses all 20 built-in
parsers and returns AnyEntity[]. When given an explicit parser array, the
return type narrows to the union of those entity types.
extract(text: string): Entity[]Scans text and returns all matched entities, each with kind, value,
start, end, and text fields. Entities are returned in order of appearance.
render(text: string, fn: RenderFn<Entity>): stringExtracts entities, arranges them into a span tree (wider spans parent narrower
ones), and calls fn for each entity node. The callback receives the entity and
the already-rendered text of its children. Return a replacement string, or
undefined to leave the span as-is.
renderMap<R>(text: string, fn: RenderMapFn<Entity, R>): (string | R)[]Like .render(), but instead of producing a single string, returns an array of
segments: plain-text strings interleaved with values of type R produced by
your callback. This is the API you want for React/JSX — map entities to
elements, and the result is ready to drop into a component's children.
The callback receives { entity, children } where children is
(string | R)[] — nested entities are already mapped.
redact(text: string, opts?: RedactOptions): stringBuilt on top of .render(). Extracts entities then replaces each matched span
with opts.mask (default "█"). When opts.kinds is set, only those entity
kinds are masked. Overlapping/nested spans are resolved via the span tree.
Experimental — this method is stable enough to use but may change in a future minor release.
extractAsync(text: string, opts?: AsyncScanOptions): Promise<Entity[]>Async version of .extract() that yields to the event loop periodically.
Supports cancellation via AbortSignal.
Experimental — this method is stable enough to use but may change in a future minor release.
renderAsync(text: string, fn: RenderFn<Entity>, opts?: AsyncScanOptions): Promise<string>Async version of .render().
Experimental — this method is stable enough to use but may change in a future minor release.
renderMapAsync<R>(text: string, fn: RenderMapFn<Entity, R>, opts?: AsyncScanOptions): Promise<(string | R)[]>Async version of .renderMap().
interface AsyncScanOptions {
signal?: AbortSignal; // cancel the scan
yieldEvery?: number; // default: 512
}const PIIParsers: [
EmailParser,
PhoneParser,
IPAddressParser,
SSNParser,
CreditCardParser,
UUIDParser,
ApiKeyParser,
IBANParser,
MACAddressParser,
JWTParser,
CryptoAddressParser,
BICParser,
];Pre-built parser tuple for PII-sensitive entities. Use with
Duckling(PIIParsers) for a quick redaction pipeline.
interface RedactOptions<K extends string = string> {
mask?: string; // default: "█"
kinds?: K[]; // when omitted, all entities are redacted
}type RenderFn<E> = (ctx: {
entity: E;
children: string;
}) => string | undefined;Callback for .render(). Receives the entity and the already-rendered text of
its nested children. Return a replacement string, or undefined to leave the
span unchanged.
type RenderMapFn<E, R> = (ctx: {
entity: E;
children: (string | R)[];
}) => R;Callback for .renderMap(). Receives the entity and its children as an array of
plain-text strings and already-mapped R values. Return a value of type R to
replace the span.
Union of all 20 built-in entity types. This is the return element type of
Duckling().extract(...).
Union of the 12 PII entity types:
EmailEntity | PhoneEntity | IPAddressEntity | SSNEntity | CreditCardEntity | UUIDEntity | ApiKeyEntity | IBANEntity | MACAddressEntity | JWTEntity | CryptoAddressEntity | BICEntity.
ts-duckling uses grammar-based parsers, not ML. This means:
- Deterministic: same input → same output, every time
- Fast: no model loading, no network calls
- But imperfect: expect false positives/negatives for ambiguous inputs
For example:
// ts-duckling interprets 6/2022 as a date
Duckling([Time.parser]).extract("6/2022 is 0.00296735905");
// → [{ kind: "time", text: "6/2022", ... }]If you need high accuracy on messy, ambiguous real-world text, consider an ML-based solution. If you want predictable, fast extraction from structured or semi-structured text (messages, forms, logs), ts-duckling is a great fit.
Try ts-duckling in the browser: Live Playground
Paste or type any text and see entities extracted in real-time. You can also fetch content from a URL to test against real web pages.
MIT © Claudiu Ceia
