Utility for post-processing AI-generated text. It normalises output by
removing invisible characters (often used as watermarks or formatting
artifacts), folding exotic whitespace, converting "pretty" punctuation to
ASCII, and stripping inline citation placeholders such as
(oaicite:12){index=12}.
ai-text-sanitizer is a tiny (<6 kB) zero-dependency ES module for cleaning and normalising raw text generated by large language models before you render, store, or diff it.
The library removes invisible Unicode watermark characters, exotic whitespace, and ASCII control codes, converts fancy punctuation to plain ASCII, strips inline citation placeholders, and optionally collapses redundant spaces—all while returning per-rule change statistics so you can audit the process.
- Removes Unicode format and other zero-width characters that can act as invisible watermarks.
- Converts fancy punctuation (curly quotes, en/em dashes, ellipsis, bullets) to plain ASCII equivalents.
- Folds a wide range of Unicode space characters to a standard space.
- Collapses runs of multiple spaces and normalises line endings to
LF. - Eliminates citation placeholders emitted by some language models.
- Optionally preserves or removes emoji glue characters (ZWJ / variation selectors).
- Returns granular change statistics so you can audit the cleaning process.
pnpm add ai-text-sanitizerThis project is published as an ES module and requires Node ≥ 18.
import { sanitizeAiText } from 'ai-text-sanitizer';
const input = `\uFEFF"Hello\u200B world…" (oaicite:5){index=5}`;
const { cleaned, changes } = sanitizeAiText(input);
console.log(cleaned); // "Hello world..."
console.log(changes); /* {
removedInvisible: 2,
removedCtrl: 0,
removedCitations: 1,
prettified: 3,
collapsedSpaces: 0,
total: 6
} */ai-text-sanitizer ships with built-in .d.ts declarations. Nothing extra to install — just import and enjoy full IntelliSense:
import { sanitizeAiText, type SanitizeResult } from 'ai-text-sanitizer';
const result: SanitizeResult = sanitizeAiText('مرحبا\u200Fالعالم');
console.log(result.cleaned);sanitizeAiText(text, options?) → { cleaned, changes }
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
string |
– | Input text to sanitise. |
options |
object (optional) |
– | Behaviour flags (below). |
keepEmoji |
boolean |
true |
Keep ZWJ / variation selectors used by emoji. |
collapseSpaces |
boolean |
true |
Collapse contiguous ASCII spaces. |
The returned changes object reports how many code points were altered for
each rule plus a total sum.
pnpm install
pnpm testTests live in __tests__/ and exercise typical real-world scenarios including
HTML fragments, code snippets, emoji sequences, and BOM handling.
- The function operates on raw strings; it does not parse or sanitise HTML structure. HTML tags remain untouched but are treated as plain text.
- The mapping of fancy punctuation is intentionally conservative. If you need
broader transliteration, customise the
PRETTIEStable inaiTextSanitizer.js.
Contributions, bug reports, and feature requests are very welcome — feel free to open an issue or submit a pull request. Please ensure the test suite passes (pnpm test) and follow conventional commit messages for ease of release automation.
This repository contains only the core library and test suite to keep the footprint minimal.