fix "capsule leak" when no empty line between text and code block #780

vezwork · 2025-08-05T19:12:29Z

Problem

Example

The content of a .qmd document that caused a capsule leak:

hello
```{{python}}
1+2
```

After opening this document in the visual editor:

hello 31b8e172-b470-440e-83d8-e6b185028602:dAB5AHAAZQA6AE8AUQBCAGoAQQBHAEkAQQBOAHcAQQA1AEEARwBVAEEATgBnAEIAagBBAEMAMABBAE8AQQBBADQAQQBEAGcAQQBaAEEAQQB0AEEARABRAEEAWQBRAEEAdwBBAEcAVQBBAEwAUQBBADUAQQBHAE0AQQBPAFEAQgBpAEEAQwAwAEEAWgBnAEIAbQBBAEQAWQBBAE4AdwBCAGoAQQBHAFEAQQBOAEEAQQB3AEEARABRAEEAWgBRAEEAMgBBAEQAQQBBAAoAcABvAHMAaQB0AGkAbwBuADoATgBnAEEAPQAKAHAAcgBlAGYAaQB4ADoACgBzAG8AdQByAGMAZQA6AFkAQQBCAGcAQQBHAEEAQQBlAHcAQgA3AEEASABBAEEAZQBRAEIAMABBAEcAZwBBAGIAdwBCAHUAQQBIADAAQQBmAFEAQQBLAEEARABFAEEASwB3AEEAeQBBAEEAbwBBAFkAQQBCAGcAQQBHAEEAQQAKAHMAdQBmAGYAaQB4ADoA:31b8e172-b470-440e-83d8-e6b185028602

We see an entire capsule instead of the code block! Note that this shows an example of a <CAPSULE>.

Description

In the process of switching to the visual editor there are three relevant steps:

modifying the .qmd string by turning code blocks into capsules
passing the modified string to pandoc to get an AST
converting the AST to a prosemirror document

What are capsules? Capsules are base64 encoded strings with a uuid prefix that replace parts of a document so that they are not parsed based on their structure by pandoc (we are smuggling certain parts of the document through the pandoc parse). After pandoc parses the document into an AST we find the capsules in the AST and decode them into their original content. The user should never see capsules.

Why does the bug happen? In the example, there is not an empty line between the text "hello" and the code block. In step 1, the code block is turned into a capsule, resulting in a string of this form:

hello
<CAPSULE>

Now in step 2, this string is passed to pandoc, which, since there is no empty line between "hello" and the capsule, parses as a single paragraph like

[ Para
    [ Str "hello"
    , SoftBreak
    , Str <CAPSULE>
    ]
]

Now in step 3, unfortunately when looking for code block capsules in the AST, we were looking (blockCapsuleParagraphTokenHandler(kEscapedRmdChunkBlockCapsuleType)) only for paragraphs with a single content that held a capsule string. As a result this capsule was not detected and was treated as a normal string in a paragraph, being written directly into the prosemirror document.

Fix

We now understand that code block capsules can end up as a Str in paragraphs with multiple content, so we now look for those cases with this new code:

handleToken:
  blockCapsuleHandlerOr(
    blockCapsuleParagraphTokenHandler(kEscapedRmdChunkBlockCapsuleType),
    blockCapsuleStrTokenHandler(kEscapedRmdChunkBlockCapsuleType)
  ),

Which detects a block capsule in the case of paragraphs with a single Str content (as was previously checked), or in the case of any Str that looks like a code block capsule. Note that this detection logic is a bit redundant, but I think it may be slightly more "surgical" of a change because the old detection method is still used when it can be (we change as little behaviour as we can).

As a result of this new detection, we had to modify how code blocks are written into the prosemirror document. When a code block capsule is detected the prosemirror writer may now be in a state where it is writing into a paragraph block because code block capsules can now be inside paragraphs with other content. When this it is the case that a code block capsule is in a paragraph (writer.isNodeOpen(schema.nodes.paragraph)) we now close the paragraph, write the code block, then open a paragraph.

Method

@cscheid and I attempted another fix where we prepended and appended newlines before code block capsules. Unfortunately, in the case of code blocks in code blocks, this was seen to introduce additional newlines into the document on every roundtrip.
I did a bunch of refactors/clean-ups/logs in this area to get a handle on what is happening in the visual editor when converting to the visual editor. The refactors and clean-ups are captured in this PR refactor prosemirror conversion for understandability #779. Here is a picture of some logs that shows the start of the actions that the prosemirror writer does during conversion (The prosemirror writer is invoked as the final step in the conversion process):

vezwork force-pushed the fix/no-newline-capsule-leak branch from 70af560 to 4b2a172 Compare August 6, 2025 14:11

vezwork marked this pull request as ready for review August 6, 2025 14:12

vezwork requested a review from cscheid August 6, 2025 14:12

vezwork changed the title ~~fix code block capsule leak when no empty line between text and code block~~ fix "capsule leak" when no empty line between text and code block Aug 6, 2025

fix code block capsule leak by inspecting str tokens

80503f8

vezwork force-pushed the fix/no-newline-capsule-leak branch from 4b2a172 to 80503f8 Compare August 12, 2025 14:18

add capsule-leak snapshot test

bed8a72

vezwork merged commit 9c8dee2 into main Aug 12, 2025
2 checks passed

vezwork mentioned this pull request Aug 15, 2025

Documentation Code Blocks with text on the line after turn into Base64 in the visual editor #718

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix "capsule leak" when no empty line between text and code block #780

fix "capsule leak" when no empty line between text and code block #780

Uh oh!

vezwork commented Aug 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix "capsule leak" when no empty line between text and code block #780

fix "capsule leak" when no empty line between text and code block #780

Uh oh!

Conversation

vezwork commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Example

Description

Fix

Method

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vezwork commented Aug 5, 2025 •

edited

Loading