Skip to content

Markdown code block validation and rendering tool #345

@julianhyde

Description

@julianhyde

A tool that discovers Morel code fragments embedded in markdown files (as HTML comments in .smli format), executes them, validates their output, and generates styled HTML blocks following each comment. If all commands succeed but actual output differs from expected, the tool updates the markdown file in-place.

Motivation

Blog posts and documentation about Morel contain code examples that can become stale as the language evolves. Today there is no way to validate that examples in markdown are correct. Morel's .smli test infrastructure already solves this problem for test files; this feature extends it to markdown.

Similar tools

Several languages have tools that execute code blocks embedded in documentation and validate their output:

  • ocaml-mdx executes OCaml toplevel phrases in markdown code fences. Input lines are prefixed with #; output lines follow without prefix. Directives are placed in HTML comments above the block (e.g., <!-- $MDX env=e1 -->). On mismatch it generates a .corrected file for review. Used by Real World OCaml.
  • Scala mdoc uses modifiers on the code fence language tag (e.g., ```scala mdoc). It processes an input .md template and writes a new .md file with output inserted. Modifiers include mdoc:silent, mdoc:fail, mdoc:crash, and mdoc:compile-only.
  • Rust doctests compile and run code blocks in documentation. Attributes like ignore, no_run, compile_fail, and should_panic go after the triple backticks. There is no expected-output comparison; tests pass if the code compiles and doesn't panic.

This proposal differs from all three in that the source of truth (code and expected output) lives inside HTML comments rather than in visible code fences, and the tool generates rendered HTML rather than relying on a markdown processor's syntax highlighting.

Design

Embedding format

Morel code is embedded in HTML comments using .smli format: input lines are plain text, expected output lines are prefixed with >. The comment opens with <!-- morel and optional space-separated attributes.

<!-- morel
fun len [] = 0
  | len (_ :: tl) = 1 + len tl;
> val len = fn : 'a list -> int
-->

The tool generates a <div class="morel"> block immediately after the closing -->. The generated HTML contains the input in a <pre><code class="morel-input"> element and the output in a <pre><code class="morel-output"> element, with optional syntax highlighting via inline HTML tags (bold keywords, italic type variables, etc.).

<!-- morel
fun len [] = 0
  | len (_ :: tl) = 1 + len tl;
> val len = fn : 'a list -> int
-->
<div class="morel">
<pre><code class="morel-input"><b>fun</b> len [] = 0
  | len (_ :: tl) = 1 + len tl;</code></pre>
<pre><code class="morel-output">val len = fn : <i>'a</i> list -> int</code></pre>
</div>

A comment may contain multiple input/output pairs. Each pair becomes a separate input/output <pre> element within the same <div>.

<!-- morel
fun len [] = 0
  | len (_ :: tl) = 1 + len tl;
> val len = fn : 'a list -> int
len [1, 2, 3];
> val it = 3 : int
-->
<div class="morel">
<pre><code class="morel-input"><b>fun</b> len [] = 0
  | len (_ :: tl) = 1 + len tl;</code></pre>
<pre><code class="morel-output">val len = fn : <i>'a</i> list -> int</code></pre>
<pre><code class="morel-input">len [1, 2, 3];</code></pre>
<pre><code class="morel-output">val it = 3 : int</code></pre>
</div>

Attributes

Attributes are space-separated tokens on the opening <!-- morel line. Flags are bare words; parameters use key=value syntax.

Attribute Description
env=NAME Execute in a named environment. Environments are created fresh on first use and persist across blocks that share the same name. Blocks with no env attribute share a single default environment.
silent Execute and validate output, but do not generate an HTML block. Useful for preambles that define helper functions or load data.
fail Expect the code to produce an error. The tool reports a failure if the code succeeds.
skip Do not execute the code. Still generate an HTML block from the comment content as-is. Useful for illustrative examples of incorrect code.

Attributes are orthogonal and may be combined freely. Examples:

<!-- morel -->
<!-- morel silent -->
<!-- morel env=demo -->
<!-- morel silent env=demo -->
<!-- morel fail -->
<!-- morel fail env=errors -->
<!-- morel skip -->

Environments

Blocks with no env attribute all share a single default environment. This is the common case; most documents need only one environment and never specify env at all.

Each environment maintains independent REPL state (bindings, type context). Blocks execute in document order within their environment. A named environment is created the first time a block references it and persists for the rest of the document. Blocks can interleave environments:

<!-- morel env=a
val x = 1;
> val x = 1 : int
-->
<div class="morel">...</div>

<!-- morel env=b
val x = 99;
> val x = 99 : int
-->
<div class="morel">...</div>

<!-- morel env=a
x + 1;
> val it = 2 : int
-->
<div class="morel">...</div>

The third block executes in environment a and sees x = 1.

Preambles

Use silent to define helpers that subsequent blocks depend on without showing them to the reader:

<!-- morel silent
fun mustBeList (list: 'a list) = list;
> val mustBeList = fn : 'a list -> 'a list
-->

<!-- morel
from x in mustBeList [1, 2, 3] yield x * x;
> val it = [1,4,9] : int list
-->
<div class="morel">...</div>

The silent block executes in the default environment, so the following block sees mustBeList. Use silent env=NAME for a preamble scoped to a named environment.

Tool behavior

Invocation

./morel --md file.md          # execute, update expected output and HTML in-place
./morel --md-verify file.md   # verify only, report mismatches, do not modify

Multiple files may be specified. Exit code 0 means all blocks validated successfully.

Execution

  1. Scan the markdown file for <!-- morel ... --> comments.
  2. Parse each comment to extract attributes and .smli content (input lines and >-prefixed expected output lines).
  3. Execute blocks in document order, dispatching each to its environment. Skip blocks with the skip attribute. For fail blocks, assert that execution produces an error.
  4. Compare actual output to expected output for each block.

Update behavior

In --md mode, if all blocks execute successfully (no unexpected errors) but some blocks have output that differs from expected:

  1. Update the > lines inside the <!-- morel ... --> comment to match actual output.
  2. Regenerate the <div class="morel">...</div> block following the comment.
  3. Write the modified markdown file in-place.

If any block fails unexpectedly (execution error without fail attribute), report the error with the line number and do not modify the file.

Verify behavior

In --md-verify mode, the tool reports mismatches as diffs but does not modify the file. This is suitable for CI.

HTML generation

The tool replaces the <div class="morel">...</div> block immediately following each non-silent comment. If no such block exists (first run), the tool inserts one. The tool identifies the block to replace by matching the <div class="morel"> opening tag through the corresponding </div> closing tag.

Syntax highlighting in the generated HTML is done with inline tags (<b> for keywords, <i> for type variables, <span> with classes for other categories). This avoids any dependency on external syntax highlighting libraries or custom markdown processors.

Styling

Blog authors include a CSS block (or link to a stylesheet) to style the generated HTML. A minimal example:

div.morel { margin: 1em 0; }
div.morel pre { margin: 0; padding: 0.5em 1em; }
.morel-input { background: #f6f8fa; }
.morel-output { background: #f0f0f0; color: #555; border-left: 3px solid #ccc; }

The tool does not emit inline styles, so authors have full control over presentation.

Out of scope

  • Custom data sources: Blocks execute in a plain Morel environment without Calcite or foreign data. A future extension could allow a data=FILE attribute to load a data dictionary.
  • Incremental execution: The tool re-executes all blocks on every run. Caching is a potential future optimization.
  • Non-Morel code blocks: The tool only processes <!-- morel ... --> comments. Other code blocks in the markdown are left untouched.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions