fix "capsule leak" when no empty line between text and code block #780
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Example
The content of a .qmd document that caused a capsule leak:
After opening this document in the visual editor:
We see an entire capsule instead of the code block! Note that this shows an example of a
<CAPSULE>
.Description
In the process of switching to the visual editor there are three relevant steps:
What are capsules? Capsules are base64 encoded strings with a uuid prefix that replace parts of a document so that they are not parsed based on their structure by pandoc (we are smuggling certain parts of the document through the pandoc parse). After pandoc parses the document into an AST we find the capsules in the AST and decode them into their original content. The user should never see capsules.
Why does the bug happen? In the example, there is not an empty line between the text "hello" and the code block. In step 1, the code block is turned into a capsule, resulting in a string of this form:
Now in step 2, this string is passed to pandoc, which, since there is no empty line between "hello" and the capsule, parses as a single paragraph like
Now in step 3, unfortunately when looking for code block capsules in the AST, we were looking (
blockCapsuleParagraphTokenHandler(kEscapedRmdChunkBlockCapsuleType)
) only for paragraphs with a single content that held a capsule string. As a result this capsule was not detected and was treated as a normal string in a paragraph, being written directly into the prosemirror document.Fix
We now understand that code block capsules can end up as a
Str
in paragraphs with multiple content, so we now look for those cases with this new code:Which detects a block capsule in the case of paragraphs with a single Str content (as was previously checked), or in the case of any Str that looks like a code block capsule. Note that this detection logic is a bit redundant, but I think it may be slightly more "surgical" of a change because the old detection method is still used when it can be (we change as little behaviour as we can).
As a result of this new detection, we had to modify how code blocks are written into the prosemirror document. When a code block capsule is detected the prosemirror writer may now be in a state where it is writing into a paragraph block because code block capsules can now be inside paragraphs with other content. When this it is the case that a code block capsule is in a paragraph (
writer.isNodeOpen(schema.nodes.paragraph)
) we now close the paragraph, write the code block, then open a paragraph.Method
@cscheid and I attempted another fix where we prepended and appended newlines before code block capsules. Unfortunately, in the case of code blocks in code blocks, this was seen to introduce additional newlines into the document on every roundtrip.
I did a bunch of refactors/clean-ups/logs in this area to get a handle on what is happening in the visual editor when converting to the visual editor. The refactors and clean-ups are captured in this PR refactor prosemirror conversion for understandability #779. Here is a picture of some logs that shows the start of the actions that the prosemirror writer does during conversion (The prosemirror writer is invoked as the final step in the conversion process):