Small Markdown Parser

Experimental 'batteries included' client-side markdown parser & renderer written in pure TypeScript.

This same readme as a demo: md2.at

Background

There are already many excellent, battle-tested markdown parsing / rendering libraries and utilities available in js/ts ecosystem. However, none of those were fully suitable for me in my daily work in another contexts where ease of use, lightness and privacy are essential requirements.

So, I decided to create a tool that would allow me to visualize any markdown in an accessible way with as little effort as possible. This also worked as a nice reminder and bit of a learning experience in working with modern js/ts lower level capabilities, and I think this can work also as an example of how JIT-compiled javaScript can take an advantage of contiguous memory layout for storing state. Although not optimized yet, it can already make a quite a difference in performance and memory usage.

I have tried this with quite large .md files (+100Mb), that contained pretty much only code. As the basic syntax-highlighting is built-in and those blocks are fairly heavy to render, they performed surprisingly well, even on my phone.

As an example to use the parser/renderer I created small service md2.at which is just a client-side typescript on free static hosting (render.com). This small "service" allows me to append any publicly available .md -file into the service's url and I get shareable/embeddable visualization for that markdown.

This example service is still in very early stages but it is going to stay

Aimed for making markdown visualizations more accessible while maintaining efficiency and privacy.

Zero external dependencies in build

The implementation keeps external dependencies out of the hot path and focuses on predictable, byte-level processing. Both HTML and Canvas renderers are included so the same parse result can be examined in different output backends.

The project is still in an early phase, the public API and packaging will evolve before the planned publication later this year.

The intent of this repository is not to compete with broad Markdown frameworks but to provide accessible visualisation while keeping the ratio between performance and supported features reasonable.

Capabilities

Single-pass parsing implemented with byte spans rather than string slicing.
No regular expressions in the core parser; all matching is done with explicit scans.
Two renderers: an HTML renderer that emits escaped markup and a Canvas renderer for visual inspection.
Arena-style byte buffer to reduce allocations while building output.
Test coverage that includes golden tests, property-based fuzzing, and targeted benchmarks.
GitHub Flavored Markdown coverage for tables, task items, and strikethrough.
URL allowlisting and HTML escaping enabled by default.
ESM exports suitable for browser bundlers and server-side usage.
Optional dark/light UI presets with persisted preference and theme builder integration.

Supported Markdown Features

Headings (H1-H6): # Heading
Blockquotes: > Quote
Lists:
- Unordered: - Item or * Item or + Item
- Ordered: 1. Item
- Task Lists: - [ ] Unchecked or - [x] Checked
Horizontal Rules: --- or ***
Code Blocks: Fenced with ``` or ~~~
Inline Code: `code`
Emphasis: *italic* or _italic_
Strong: **bold** or __bold__
Strikethrough: ~~struck~~
Links: [text](url)
Images: ![alt](src)
Autolinks: Automatic linking of http://, https://, and www. URLs
Tables: | Header | Header |\n|--------|--------|\n| Cell | Cell |
Info Blocks: ::: info, ::: warning, ::: error, ::: success

Syntax Highlighting

Syntax Highlighting language specs are not complete and probably contain still many issues, few that I'm already aware of and working towards to fix those.

To keep things lightweight, this is probably going to be an optional plugin based feature in the future to get correct grammars for different languages.

So far built-in basic syntax highlighters cover the following languages:

JavaScript / TypeScript
Python
Java
C / C++
C#
Go
Rust
Swift
Kotlin
Scala
Dart
Ruby
PHP
Shell scripts (bash/sh/zsh)
PowerShell
Lua
Perl
Haskell
Elixir
Erlang
Clojure
R
SQL
JSON
YAML
TOML
INI / config files
Dockerfile
Make / Makefile
F#
HTML / XML / SVG

Additional languages can be registered at runtime with registerHighlightLanguage.

I have experimental setup of using precompiled language specs in runtime to reduce overhead of compiling those but this is not optimal way to do things and might look bad as the code containes block of base64 encoded binary representation that is consumed by highlihting. This code is used to generate the precompiled.ts file.

Usage

HTML Rendering

Should work with both, browser and SSR.

import { MDParser, u8 } from 'smdp';

const parser = new MDParser({
  // Security: disable raw HTML blocks by default
  allowRawHtml: false,
  // Custom URL allowlist (optional)
  urlAllowlist: (url) => url.startsWith('https://') || url.startsWith('mailto:'),
});

const markdown = '# Hello World\n\nThis is **bold** text with ~~strikethrough~~ and `code`.';
parser.parse(u8(markdown)).then(html => {
  console.log(html);
  // Output: <h1>Hello World</h1>\n<p>This is <strong>bold</strong> text with <del>strikethrough</del> and <code>code</code>.</p>\n
});

Canvas Rendering

Works only in browser, still work in progress

import { MDParser, u8 } from 'smdp';

const parser = new MDParser();
const canvas = document.createElement('canvas');
canvas.width = 800;

const markdown = `# Hello Canvas

This is **bold** text with ~~strikethrough~~.

- [ ] Task list item
- [x] Completed task

| Header 1 | Header 2 |
|----------|----------|
| Cell 1   | Cell 2   |

\`\`\`javascript
function hello() {
  console.log('world');
}
\`\`\``;

parser.renderToCanvas(u8(markdown), canvas);
document.body.appendChild(canvas);

SSR Usage (Node.js)

// In Node.js or SSR environments, only HTML parsing is available
import { MDParser, u8 } from 'smdp';

const parser = new MDParser();
const markdown = '# Server-Side Rendering\n\nWorks without DOM APIs.';
parser.parse(u8(markdown)).then(html => {
  console.log(html);
});

// Canvas rendering is not available in SSR environments
// parser.renderToCanvas(u8(markdown), canvas); // ❌ Not available

Book Mode (Multi-Part Markdown)

Use /book/<entry-url> to treat a markdown document as a book entry that links to other chapters.

github.com/.../blob/... chapter links are automatically converted to raw.githubusercontent.com/... for fetching.
Relative chapter links (for example ./chapter-2.md) are resolved against each chapter file URL.
Linked markdown chapters are discovered and prefetched in the background.

Example:

https://md2.at/book/https://github.com/owner/repo/blob/main/docs/README.md

When a chapter link is opened, the selected part is stored in ?part=<chapter-url> so deep links remain shareable.

Syntax Highlighting

import { highlightCodeBlock } from 'smdp/highlight';

const code = 'function fibonacci(n) {\n  if (n <= 1) return n;\n  return fibonacci(n - 1) + fibonacci(n - 2);\n}';
const highlighted = highlightCodeBlock(new TextEncoder().encode(code), 'javascript');
console.log(new TextDecoder().decode(highlighted));

Theme Builder

import { createThemeBuilder } from 'smdp/theme';

const builder = createThemeBuilder()
  .withMeta({ colorScheme: 'light', fontFamily: '"IBM Plex Sans", system-ui, sans-serif' })
  .withTokens({
    bgBase: '#f5f6fa',
    textPrimary: '#1f2933',
    accent: '#2563eb',
    codeKw: '#7c3aed',
  });

// Option 1: apply directly to the current document
builder.apply(); // defaults to document.documentElement

// Option 2: inject scoped CSS (useful for SSR or style encapsulation)
const themeCss = builder.buildCss(':root');

The demo includes a palette button that opens a theme editor. The editor uses the same ThemeBuilder helper exposed through the public API and updates CSS variables in place.

Principles

Privacy: there is no telemetry or analytics built in the code. Requests occur only when loading external Markdown that the user specifies to be loaded from trusted source.
Licensing: the entire codebase is released under the MIT License.
AI usage: we highly value carefully hand-crafted code while recognising that LLMs, applied with intent and review, can accelerate exploration without diluting quality.

Architecture

The parser is split into logical modules:

types.ts: TypeScript type definitions and interfaces
constants.ts: Pre-encoded HTML tags and styling constants
utils.ts: Byte-level utility functions for parsing
arena.ts: Memory-efficient HTML buffer with geometric growth
line-parser.ts: Line span generator for input splitting
inline-parser.ts: Inline token generator (emphasis, code, links, etc.)
block-parser.ts: Block-level structure parser (headings, lists, code blocks, etc.)
html-renderer.ts: HTML output renderer
canvas-renderer.ts: Canvas output renderer
index.ts: Main MDParser class and public API

Parser Pipeline

The core pipeline is built around byte ranges rather than strings. The process is:

Line segmentation: lineSpans walks the Uint8Array, recording start/end offsets for each line. No copies are made, and the raw array is never converted to strings at this stage.
Block parsing: blocks iterates through the line spans once, emitting events such as heading, listOpen, listItem, codeOpen, etc. Indentation, fences, and info blocks are resolved here. Since block parsing is single-pass, nested structures (lists-in-lists, blockquotes) are tracked via a small stack structure.
Inline parsing: For ranges that require inline formatting (links, emphasis, code spans), inlineTokens performs another byte-level pass within the line boundaries. It produces typed tokens (text, link, img, code, autolink, strike, ...). Multiple passes are avoided by piggybacking on the already segmented line spans.
Rendering: Both renderers consume the block/inlines event stream without reparsing. The HTML renderer writes directly into an arena-like buffer (see arena.ts), which grows geometrically to limit reallocations. The Canvas renderer replays the same stream into 2D drawing commands, relying on the same inline tokenization for highlighting and styling.

Important details:

Writer: The HTML renderer calls HtmlArena.writeEscaped and related methods that operate on byte slices, so writing out HTML stays allocation-friendly and avoids intermediate strings. Only at the end is Uint8Array converted back to a string (TextDecoder).
Syntax highlighting: The highlighting path is decoupled from the markdown parser. When a fenced code block is found, the captured byte ranges are passed to highlightCodeBlock. Highlighting uses a generative tokenizer compiled from language specs (or precompiled data), then writes markup via the same arena-like approach.
Canvas rendering: renderToCanvasFromBlocks shares the block event stream but renders into a canvas context. It keeps cached font measurements, performs line-wrapping per block, and triggers a rerender when images finish loading. Virtual scrolling is used when the rendered height exceeds twice the viewport.

// High-level structure: see src/parser/index.ts
export class MDParser {
  async parse(u8arr: Uint8Array) {
    return renderHTMLFromBlocks(u8arr, this.options);
  }

  renderToCanvas(u8arr: Uint8Array, canvas: HTMLCanvasElement) {
    renderToCanvasFromBlocks(u8arr, canvas, this.options);
  }
}

// renderHTMLFromBlocks (simplified) in src/parser/html-renderer.ts
for (const ev of blocks(u8)) {
  switch (ev.type) {
    case 'heading':
      arena.writeBytes(TAG.hPre[ev.level - 1]);
      renderInline(u8, ev.s, ev.e, arena, options);
      arena.writeBytes(TAG.hClose[ev.level - 1]);
      break;
    case 'codeOpen':
      codeBuffer = [];
      break;
    case 'codeText':
      codeBuffer.push({ s: ev.s, e: ev.e });
      break;
    case 'codeClose':
      const highlighted = await highlightCodeBlock(join(codeBuffer), codeLang);
      arena.writeBytes(highlighted);
      codeBuffer = null;
      break;
    // ...other block types (lists, blockquotes, tables, info blocks)
  }
}

// inlineTokens (see src/parser/inline-parser.ts) walks a byte slice and emits tokens
if (c === 0x5b /* '[' */) {
  const close = findBracket(u8, i + 1, e, 0x5d);
  if (close !== -1) {
    const hrefStart = close + 2; // '(' after ']'
    const hrefEnd = findBracket(u8, hrefStart, e, 0x29);
    tokens.push({ kind: 'link', textS: i + 1, textE: close, hrefS: hrefStart, hrefE: hrefEnd });
  }
}

// Canvas renderer consumes the same events (src/parser/canvas-renderer.ts)
for (const ev of blocks(u8)) {
  switch (ev.type) {
    case 'paraLine':
      renderInlineToCanvas(ev.s, ev.e, ctx, currentX, currentY);
      break;
    case 'img':
      const src = resolveUrlRelativeToBase(...);
      const cached = loadImage(src, rerender);
      drawImageOrPlaceholder(cached, ctx, currentX, currentY);
      break;
    // ...other block rendering
  }
}

Current Strengths

Predictable performance: Byte-range processing and arena-like buffers keep allocations low, which shows up in the included micro-benchmarks (npm run test:bench).
Single-pass correctness: Blocks are identified without backtracking, inline parsing respects boundaries established by the block layer (for example, emphasis is never resolved inside code spans).
Separation of concerns: HTML and Canvas renderers consume the same block/inline events so new renderers (e.g., PDF or terminal) can be added without touching the parser core.
Themeable UI: The public theme builder feeds both the default UI and consumer customizations; the new light/dark presets are simply predefined token sets.

Areas for Improvement

Streaming input: Although the parser is single-pass, it still expects the full Uint8Array. Enabling incremental parsing (e.g., processing chunks from a stream) would reduce memory spikes for very large documents.
Error recovery: Inline parsing errs on the side of stopping at malformed constructs. Better error recovery could keep rendering intact even when Markdown is intentionally or accidentally broken.
Extensibility hooks: Callbacks for custom block/inline tokens could be surfaced. Today, extensions require forking the parser.
Canvas accessibility: The Canvas renderer focuses on presentation. To serve assistive technologies, a hybrid mode that emits both Canvas and hidden HTML (or ARIA descriptions) would close the accessibility gap.
More grammars: The highlighting pipeline accepts additional grammars, but coverage remains limited to the precompiled set. Expanding that library or providing an easier authoring path is on the roadmap.

API Reference

`MDParser`

Main parser class.

`parse(u8arr: Uint8Array, overrides?: ParserOptions): Promise<string>`

Parses Markdown (as Uint8Array) and returns a Promise that resolves to an HTML string. Pass overrides.baseUrl to rewrite relative links and image sources against the fetched document's origin.

`renderToCanvas(u8arr: Uint8Array, canvas: HTMLCanvasElement, overrides?: ParserOptions): void`

Renders Markdown (as Uint8Array) to an HTML5 Canvas.

`u8(str: string): Uint8Array`

Utility function to convert a string to Uint8Array using UTF-8 encoding.

Exported Types

InlineToken: Token types for inline parsing
BlockEvent: Event types for block parsing
LineSpan: Line position information
TextStyle: Styling information for canvas rendering
DrawResult: Canvas drawing result coordinates

Low-Level API

You can also use the individual parsers and renderers:

lineSpans(u8: Uint8Array): Generator yielding line spans
inlineTokens(u8: Uint8Array, s: number, e: number): Generator yielding inline tokens
blocks(u8: Uint8Array): Generator yielding block events
renderHTMLFromBlocks(u8: Uint8Array, options?: ParserOptions): Render blocks to HTML
renderToCanvasFromBlocks(u8: Uint8Array, canvas: HTMLCanvasElement, options?: ParserOptions): Render blocks to canvas

Development

TypeScript Configuration

The project uses modern TypeScript with strict type checking enabled:

ES2022 target
ESNext modules
Strict mode enabled
Bundler module resolution
Comprehensive linting rules

Testing

Test suites include golden comparisons, property-based checks, and micro-benchmarks:

# Run all tests
npm test

# Run specific test suites
npm run test:golden     # Golden tests for parser output
npm run test:property   # Property-based tests for parser invariants
npm run test:bench      # Performance benchmarks

# Watch mode for development
npm run test:watch

Build

This is designed to work with Vite or similar modern bundlers.

npm install
npm run dev
npm run build

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
bench		bench
docs		docs
public		public
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.md		example.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Small Markdown Parser

Background

Capabilities

Supported Markdown Features

Syntax Highlighting

Usage

HTML Rendering

Canvas Rendering

SSR Usage (Node.js)

Book Mode (Multi-Part Markdown)

Syntax Highlighting

Theme Builder

Principles

Architecture

Parser Pipeline

Current Strengths

Areas for Improvement

API Reference

`MDParser`

`parse(u8arr: Uint8Array, overrides?: ParserOptions): Promise<string>`

`renderToCanvas(u8arr: Uint8Array, canvas: HTMLCanvasElement, overrides?: ParserOptions): void`

`u8(str: string): Uint8Array`

Exported Types

Low-Level API

Development

TypeScript Configuration

Testing

Build

License

About

Uh oh!

Releases

Packages

Languages

License

MatiasHiltunen/smdp

Folders and files

Latest commit

History

Repository files navigation

Small Markdown Parser

Background

Capabilities

Supported Markdown Features

Syntax Highlighting

Usage

HTML Rendering

Canvas Rendering

SSR Usage (Node.js)

Book Mode (Multi-Part Markdown)

Syntax Highlighting

Theme Builder

Principles

Architecture

Parser Pipeline

Current Strengths

Areas for Improvement

API Reference

MDParser

parse(u8arr: Uint8Array, overrides?: ParserOptions): Promise<string>

renderToCanvas(u8arr: Uint8Array, canvas: HTMLCanvasElement, overrides?: ParserOptions): void

u8(str: string): Uint8Array

Exported Types

Low-Level API

Development

TypeScript Configuration

Testing

Build

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`MDParser`

`parse(u8arr: Uint8Array, overrides?: ParserOptions): Promise<string>`

`renderToCanvas(u8arr: Uint8Array, canvas: HTMLCanvasElement, overrides?: ParserOptions): void`

`u8(str: string): Uint8Array`

Packages