Skip to content

Migrate markdown-gfm to remark and rehype#18645

Merged
pomek merged 29 commits intomasterfrom
internal/4006-migrate-markdown-gfm-plugin-to-unifiedjs-ecosystem
Jun 26, 2025
Merged

Migrate markdown-gfm to remark and rehype#18645
pomek merged 29 commits intomasterfrom
internal/4006-migrate-markdown-gfm-plugin-to-unifiedjs-ecosystem

Conversation

@filipsobol
Copy link
Copy Markdown
Member

@filipsobol filipsobol commented Jun 6, 2025

Suggested merge commit message (convention)

Other (markdown-gfm): Migrate to remark / rehype packages. Closes #18684.

MINOR BREAKING CHANGE (markdown-gfm): Migrate from marked and turndown to remark and rehype.

MINOR BREAKING CHANGE (markdown-gfm): Enable autolinking in Markdown (works only when loading Markdown content into the editor).


Things to prioritize when testing this PR

The following areas should be prioritized when testing this PR, as they previously had (or have now) custom handling:

  • Lists – both those loaded into the editor and those created within the editor (especially to-do lists).
  • Links – regular links and plain text links like www.example.com that are converted into links when loaded into the editor.
  • Code blocks
  • HTML embedded within the Markdown
  • The keepHtml function (which appears to lack documentation aside from the API reference)
  • Loading Markdown generated by the older version of the integration into the new one

A few thoughts on the migration

The remark and rehype combo works more predictably – likely because they come from the same ecosystem – compared to marked and turndown, which are developed independently. This is also reflected in the tests, which needed to be updated to pass.

One benefit of the older setup was its smaller size. Migrating from marked and turndown to remark and rehype increases the build size by 40 KiB gzipped (140.51 KiB uncompressed).

The unifiedjs ecosystem (which includes remark and rehype) is also much more modular, while turndown and marked don't have any external dependencies. This increases the total number of dependencies from 10 to 97. The number of direct dependencies in markdown's package.json increased from 5 to 14. However, the following plugins are foundational and are already dependencies of the larger ones:

  • hast
  • hast-util-from-dom
  • hast-util-to-html
  • hast-util-to-mdast
  • unist-util-visit

Challenges

There were two major challenges when implementing this change.

Autolinking

The first was autolinking. The original implementation had autolinking (automatically turning text links like http://example.com into <http://example.com>) disabled. While this wasn't compliant with the GFM spec, there may have been reasons for it. However, we didn’t treat text links like regular text either, because we explicitly disabled escaping of them. This meant they were neither a link nor plain text.

As a result, Markdown returned by the editor could include unescaped text links, which would later be turned into links by other Markdown parsers. We decided to enable autolinking so text links are turned into regular links when loaded in the editor. Text links typed directly into the editor will not be turned into links, but instead escaped like regular text –providing a better WYSIWYG experience.

// Markdown => HTML
[www.example.com](www.example.com)              => <a href="www.example.com">www.example.com</a>
www.example.com                                 => <a href="www.example.com">www.example.com</a>

// HTML => Markdown
<a href="www.example.com">www.example.com</a>   => [www.example.com](www.example.com)
www.example.com                                 => www\.example.com
Bundle size

Another issue we anticipated was bundle size. Typically, Markdown and HTML parsing happens on the server, so most packages are built with that environment in mind. However, unlike the browser, server environments don't have built-in DOM parsers and instead use packages like parse5. While parse5 is excellent, it's completely unnecessary in a browser environment, which already has a built-in DOM parser.

The rehype-dom-parse and rehype-dom-stringify packages are DOM-based alternatives to rehype-parse and rehype-stringify, and they resolved most of the related issues. One issue they didn’t solve was parsing HTML inside Markdown. Unfortunately, there doesn’t appear to be a first-party solution for this, which necessitates writing a small custom plugin.

You can find more information here: https://github.com/orgs/rehypejs/discussions/202. While this discussion hadn’t received an answer at the time of writing, it provides useful context for the issue, so I’m leaving the link for “future us”.

@filipsobol filipsobol marked this pull request as ready for review June 12, 2025 13:10
@filipsobol
Copy link
Copy Markdown
Member Author

filipsobol commented Jun 12, 2025

PR is still WIP, I just need CI to run the tests

EDIT: PR is ready for review.

@filipsobol filipsobol changed the title [PoC] Migrate markdown-gfm to remark / rehype packages. [PoC] Migrate markdown-gfm to remark and rehype Jun 16, 2025
@filipsobol filipsobol changed the title [PoC] Migrate markdown-gfm to remark and rehype Migrate markdown-gfm to remark and rehype Jun 16, 2025

// Check if view has correct data.
expect( html ).to.equal( viewString );
expect( JSON.stringify( html ) ).to.equal( JSON.stringify( viewString ) );
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This improves the readability when the output is different from the expected value.


// When converting back it will be normalized and spaces
// at the beginning of inline code will be removed.
'regular text and `inline code`'
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason why this should be “normalized” like this. It seems invalid.

Copy link
Copy Markdown
Contributor

@pszczesniak pszczesniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that LICENSE.md file must be updated also.

@filipsobol
Copy link
Copy Markdown
Member Author

I assume that LICENSE.md file must be updated also.

Done ✔️

@filipsobol filipsobol requested a review from pszczesniak June 17, 2025 12:07
arkflpc
arkflpc previously approved these changes Jun 17, 2025
@pomek pomek requested a review from psmyrek June 17, 2025 16:39
@filipsobol
Copy link
Copy Markdown
Member Author

filipsobol commented Jun 24, 2025

New lines in tables are missing from the output

Good catch. This didn't work in the old implementation too, so I don't think it's a blocker. We will discuss whether to fix this now or as a follow-up.

@filipsobol
Copy link
Copy Markdown
Member Author

PR is ready for technical re-review.

@charlttsie
Copy link
Copy Markdown
Contributor

@filipsobol We've finished testing the changes with @juliaflejterska and apart from reported issues the changes look good to us 👌

psmyrek
psmyrek previously approved these changes Jun 25, 2025
pszczesniak
pszczesniak previously approved these changes Jun 25, 2025
pomek
pomek previously approved these changes Jun 26, 2025
@filipsobol filipsobol dismissed stale reviews from pomek, pszczesniak, and psmyrek via d3de5e2 June 26, 2025 07:45
psmyrek
psmyrek previously approved these changes Jun 26, 2025
@pomek pomek merged commit e4bed9d into master Jun 26, 2025
11 checks passed
@pomek pomek deleted the internal/4006-migrate-markdown-gfm-plugin-to-unifiedjs-ecosystem branch June 26, 2025 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate markdown-gfm package to remark and rehype

6 participants