Skip to content

Conversation

vezwork
Copy link
Collaborator

@vezwork vezwork commented Aug 5, 2025

Problem

Example

The content of a .qmd document that caused a capsule leak:

hello
```{{python}}
1+2
```

After opening this document in the visual editor:

hello 31b8e172-b470-440e-83d8-e6b185028602:dAB5AHAAZQA6AE8AUQBCAGoAQQBHAEkAQQBOAHcAQQA1AEEARwBVAEEATgBnAEIAagBBAEMAMABBAE8AQQBBADQAQQBEAGcAQQBaAEEAQQB0AEEARABRAEEAWQBRAEEAdwBBAEcAVQBBAEwAUQBBADUAQQBHAE0AQQBPAFEAQgBpAEEAQwAwAEEAWgBnAEIAbQBBAEQAWQBBAE4AdwBCAGoAQQBHAFEAQQBOAEEAQQB3AEEARABRAEEAWgBRAEEAMgBBAEQAQQBBAAoAcABvAHMAaQB0AGkAbwBuADoATgBnAEEAPQAKAHAAcgBlAGYAaQB4ADoACgBzAG8AdQByAGMAZQA6AFkAQQBCAGcAQQBHAEEAQQBlAHcAQgA3AEEASABBAEEAZQBRAEIAMABBAEcAZwBBAGIAdwBCAHUAQQBIADAAQQBmAFEAQQBLAEEARABFAEEASwB3AEEAeQBBAEEAbwBBAFkAQQBCAGcAQQBHAEEAQQAKAHMAdQBmAGYAaQB4ADoA:31b8e172-b470-440e-83d8-e6b185028602

We see an entire capsule instead of the code block! Note that this shows an example of a <CAPSULE>.

Description

In the process of switching to the visual editor there are three relevant steps:

  1. modifying the .qmd string by turning code blocks into capsules
  2. passing the modified string to pandoc to get an AST
  3. converting the AST to a prosemirror document

What are capsules? Capsules are base64 encoded strings with a uuid prefix that replace parts of a document so that they are not parsed based on their structure by pandoc (we are smuggling certain parts of the document through the pandoc parse). After pandoc parses the document into an AST we find the capsules in the AST and decode them into their original content. The user should never see capsules.

Why does the bug happen? In the example, there is not an empty line between the text "hello" and the code block. In step 1, the code block is turned into a capsule, resulting in a string of this form:

hello
<CAPSULE>

Now in step 2, this string is passed to pandoc, which, since there is no empty line between "hello" and the capsule, parses as a single paragraph like

[ Para
    [ Str "hello"
    , SoftBreak
    , Str <CAPSULE>
    ]
]

Now in step 3, unfortunately when looking for code block capsules in the AST, we were looking (blockCapsuleParagraphTokenHandler(kEscapedRmdChunkBlockCapsuleType)) only for paragraphs with a single content that held a capsule string. As a result this capsule was not detected and was treated as a normal string in a paragraph, being written directly into the prosemirror document.

Fix

We now understand that code block capsules can end up as a Str in paragraphs with multiple content, so we now look for those cases with this new code:

handleToken:
  blockCapsuleHandlerOr(
    blockCapsuleParagraphTokenHandler(kEscapedRmdChunkBlockCapsuleType),
    blockCapsuleStrTokenHandler(kEscapedRmdChunkBlockCapsuleType)
  ),

Which detects a block capsule in the case of paragraphs with a single Str content (as was previously checked), or in the case of any Str that looks like a code block capsule. Note that this detection logic is a bit redundant, but I think it may be slightly more "surgical" of a change because the old detection method is still used when it can be (we change as little behaviour as we can).

As a result of this new detection, we had to modify how code blocks are written into the prosemirror document. When a code block capsule is detected the prosemirror writer may now be in a state where it is writing into a paragraph block because code block capsules can now be inside paragraphs with other content. When this it is the case that a code block capsule is in a paragraph (writer.isNodeOpen(schema.nodes.paragraph)) we now close the paragraph, write the code block, then open a paragraph.

Method

  • @cscheid and I attempted another fix where we prepended and appended newlines before code block capsules. Unfortunately, in the case of code blocks in code blocks, this was seen to introduce additional newlines into the document on every roundtrip.

  • I did a bunch of refactors/clean-ups/logs in this area to get a handle on what is happening in the visual editor when converting to the visual editor. The refactors and clean-ups are captured in this PR refactor prosemirror conversion for understandability #779. Here is a picture of some logs that shows the start of the actions that the prosemirror writer does during conversion (The prosemirror writer is invoked as the final step in the conversion process):

image

@vezwork vezwork force-pushed the fix/no-newline-capsule-leak branch from 70af560 to 4b2a172 Compare August 6, 2025 14:11
@vezwork vezwork marked this pull request as ready for review August 6, 2025 14:12
@vezwork vezwork requested a review from cscheid August 6, 2025 14:12
@vezwork vezwork changed the title fix code block capsule leak when no empty line between text and code block fix "capsule leak" when no empty line between text and code block Aug 6, 2025
@vezwork vezwork force-pushed the fix/no-newline-capsule-leak branch from 4b2a172 to 80503f8 Compare August 12, 2025 14:18
@vezwork vezwork merged commit 9c8dee2 into main Aug 12, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant