Skip to content

Normalize legacy content syntax at the source#3763

Open
dannon wants to merge 57 commits intogalaxyproject:mainfrom
dannon:content-migration
Open

Normalize legacy content syntax at the source#3763
dannon wants to merge 57 commits intogalaxyproject:mainfrom
dannon:content-migration

Conversation

@dannon
Copy link
Member

@dannon dannon commented Feb 26, 2026

Summary

The Astro preprocessor currently applies ~10 syntax transforms to every content file on every build — converting Gridsome tags, kramdown attributes, Font Awesome icons, etc. This means the build is doing work that should have been done once, and contributors can't see the "real" content in the source files.

This PR commits those transforms to the content files themselves and removes the corresponding build-time code. The preprocessor shrinks by ~220 lines, and a new CI lint (npm run content:lint) prevents legacy syntax from being reintroduced.

What changed in content (~780 files)

All changes are mechanical, applied via normalize-content.mjs:

  • Gridsome/Vue removal: <g-link>/<g-image> → standard HTML, stripped Vue imports and :binding syntax
  • Kramdown → HTML: link attributes {: .class} converted to class="...", block-level annotations ({:.table.table-striped}) removed (no standard markdown equivalent — see discussion in comments)
  • MDX compatibility: void elements self-closed (<br><br />), bare numeric attributes quoted, autolinks converted, component names PascalCased
  • Font Awesome → Lucide: all FA <i> tags, icon links, and kramdown FA patterns converted to <Icon name="..." /> MDX components (~40 files). New Icon.astro component uses lucide-static — renders at build time, no client JS
  • Frontmatter cleanup: removed layout: fields, normalized scalar tags/subsites/authors to arrays, replaced deprecated authors with contributions in example files
  • MDX opt-in simplified: added components: true to ~190 files that use MDX components, removed it from ~15 files that had the flag but no components

What changed in the build pipeline

  • Removed from preprocessor: hasProblematicHtml(), convertHtmlToJsx(), needsVueProcessing() auto-detection, SAFE_COMPONENTS list, convertFontAwesomeToLucide() and its FA→Lucide mapping table. MDX decision is now just frontmatter.components === true.
  • Added: normalize-content.mjs with --check mode for CI, content:lint npm script, CI workflow job, and make content-lint target
  • Net: ~220 fewer lines in the preprocessing pipeline

MDX is now explicit opt-in

Previously the preprocessor scanned content for known component tags and checked for "problematic HTML" that would break MDX parsing. Files with components but problematic HTML got silently downgraded to .md with no warning. Now authors add components: true to frontmatter, and content:lint catches mismatches. No auto-detection, no build-time content rewriting.

cache-dependency-path: astro/package-lock.json

- name: Lint content
run: cd astro && npm ci && npm run content:lint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a make target? I think it is important that contributors can run all our testing/linting stuff very easily.

email: name@example.com
url: https://example.com
authors: Jose and me
authors: [Jose and me]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is deprecated syntax. It should fail to validate against your YAML schema.

email: name@example.com
url: https://example.com
authors: Jose and me
authors: [Jose and me]
Copy link
Member

@bgruening bgruening Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
authors: [Jose and me]
contributions:
authorship:
- rmassei
- bgruening
funding:
- eurosciencegateway

For more information about Laniakea, please subscribe to our <a href="http://srv00.area.ba.cnr.it/mailman/listinfo/galaxy.elixir-ita.cnr.it">mailing list</a>.

[Get Access](https://laniakea-elixir-it.github.io/laniakea_at_recas){: .btn .btn-primary .trim-p}
<a href="https://laniakea-elixir-it.github.io/laniakea_at_recas" class="btn btn-primary trim-p">Get Access</a>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have a better, reusable class name here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the existing Bootstrap class names — btn btn-primary is used across the site already. The kramdown syntax `{: .btn .btn-primary .trim-p}` was just a different way to spell the same `class="btn btn-primary trim-p"`. No new classes introduced, just a mechanical kramdown→HTML conversion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have a class that renders a button and make that a documented best-practice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly — the direction is to provide proper MDX components (Button, Card, Alert, etc.) that content authors use instead of raw Bootstrap HTML. We already have 13 MDX components for embeds and icons; extending that to layout/UI components is the natural next step.

#3602 tracks this — it proposes a hybrid approach:

  1. Keep bootstrap-compat.css for existing content (already done)
  2. Build an MDX component library for common patterns (Button, Card, CardDeck, Alert, LeadText)
  3. New content uses components; old content migrated gradually
  4. Eventually deprecate bootstrap-compat.css

So for the kramdown {:.btn .btn-primary} case specifically: going forward authors would write <Button href="...">Get Access</Button> with components: true in frontmatter. The existing Bootstrap classes in old content keep working via the compat layer.

email: name@example.com
url: https://example.com
authors: Marzipan
authors: [Marzipan]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, replace with

contributions:
  authorship:
    - rmassei
    - bgruening
  funding:
    - eurosciencegateway

or something similar

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove commit ef1126a ?

b76a37c also looks suspicous.

@dannon dannon marked this pull request as draft February 26, 2026 14:18
@dannon
Copy link
Member Author

dannon commented Feb 27, 2026

Re: the review comment about ef1126a and b76a37c:

  • ef1126a ("Escape bare less-than before digits") — already reverted in 7d1f4fc. Net effect is zero.
  • b76a37c ("Clean remaining legacy syntax from frontmatter-less content files") — this commit does two things:
    1. Removes dead Gridsome import/linkbox artifacts from three GCC2013 pages (non-functional code that was never rendered)
    2. Converts kramdown FA patterns to inline Lucide SVGs in four subsite footer files (elixir-it, erasmusmc, eu, genouest) — addressed in detail in the inline comment replies. These insert files are inlined at preprocess time so <Icon> components aren't available; inline SVGs are the correct approach.

@dannon dannon marked this pull request as ready for review February 27, 2026 22:34
@bgruening
Copy link
Member

A prettier run and a rerun over the last merged to items is needed I think.

@@ -43,7 +43,7 @@ Then click "Propose new file" at the bottom. Then click the big, green "Create p

For users familiar with git and the command line, you can clone the repository locally. You should probably first fork the repository: on the [repo page](https://github.com/galaxyproject/galaxy-hub), click the "Fork" button at the upper right of the page. Then make sure it shows your username as the owner (or whoever you want the owner to be), and the rest of the options you can leave as their defaults. Then click "Create fork". Then it'll take you to the page for your fork. Now you can clone it by clicking the big green "Code" button at the upper right. You'll probably want the "SSH" option, so copy the address (which should look like `git@github.com:[your-username]/galaxy-hub.git` but with your username instead of `[your-username]`) and paste it into a git clone command:
```sh
$ git clone --recursive 'git@github.com:[your-username]/galaxy-hub.git'
$ git clone 'git@github.com:[your-username]/galaxy-hub.git'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$ git clone 'git@github.com:[your-username]/galaxy-hub.git'
$ git clone https://github.com/galaxyproject/galaxy-hub.git

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to merge, I have most, if not all issues covered in #3775

Thanks @dannon, great stuff!

```sh
$ yarn develop
$ cd astro
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let use the makefile targets here?

@@ -3,17 +3,17 @@ title: Galaxy France
subtitle: The homepage of the French Galaxy community
---

<!-- Color palette: https://www.color-hex.com/color-palette/9983 -->
<!-- Projects/Communities/Citation/Team cards -->
{/* Color palette: https://www.color-hex.com/color-palette/9983 */}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no component: true in the frontmatter, why this MDX syntax?

---

<div class="right"><i class="fa fa-wrench fa-5x text-danger" aria-hidden="true"></i></div>
<div class="right"><Icon name="wrench" /></div>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing it this way, the old way seemed to be better. Or at least we could define a class called wrench <i class="wrench"> is not a possibility? Even copying the 🔧 direclty in, seems to me better in those cases than force people to use components: true with all its quirks.

</tr>
</tbody>
</table>
| Date | Topic/Event | Venue/Location | Contact |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we here converted html to markdown table?

{% twitter https://twitter.com/denbiOffice/status/1405894946809384963 %}
{% twitter https://twitter.com/denbiOffice/status/1405900003462324238 %}
{% twitter https://twitter.com/denbiOffice/status/1405904529753886730 %}
<div class="col-md-6">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

twitter links are missing, which I personally don't mind if we are not linking out to this site anymore.
I guess you wanted to use <Twitter here

CONTRIBUTING.md Outdated
Common fields:

There are general tags that can be used on any page, and there are also specific sets of tags that can be used on specific sets of pages.
| Field | Used in | Description |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are only true for events, not news.

CONTRIBUTING.md Outdated
For previewing changes beyond what GitHub's preview offers:

```bash
git clone git@github.com:your-username/galaxy-hub.git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
git clone git@github.com:your-username/galaxy-hub.git
git clone https://github.com/galaxyproject/galaxy-hub.git

CONTRIBUTING.md Outdated

```bash
git clone git@github.com:your-username/galaxy-hub.git
cd galaxy-hub/astro
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cd galaxy-hub/astro
cd galaxy-hub

CONTRIBUTING.md Outdated
Comment on lines +98 to +99
npm install
npm run dev
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use makefile instead

1. Structuring Content, Files, and URLs
1. [Markdown](content/hub/contributing/markdown/index.md) - How to format paragraphs and images
2. [File Organization and URLs](content/hub/contributing/file-organization/index.md) - How to name files and structure URL paths
3. [Static and Dynamic Pages](content/hub/contributing/file-organization/index.md#static-pages) - How to display list of pages, custom layouts, and categorized pages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those links are useful, isn't it? We should bring them back.

dannon added 15 commits March 2, 2026 12:07
Standalone script that applies source-level transformations to
/content/ markdown files, one transform type at a time. Designed
for incremental migration of Gridsome/Jekyll syntax to standard
HTML/markdown. Supports --check for CI dry-run mode.
The layout field was a Gridsome artifact that's no longer used. Astro
determines layouts via content collections and page templates instead.
16 files across news, use, and careers content.
Converts bare scalar authors values to single-element arrays so they're
consistent with the rest of the content. Only 2 example files affected.
Also fixes a bug in the normalize script regex that was matching across
newlines into block sequences.
Removes Vue import statements and converts :prop="value" bindings to
standard prop="value" attributes. Mostly GCC2013 content that had
Gridsome-era Vue component usage. 25 files.
Replaces <g-image> with <img> tags. Only 5 files still had Gridsome
component syntax in their markdown.
Converts [text](url){:target="_blank"} and similar Kramdown inline
attribute syntax to proper <a> tags with the attributes directly on
the element. Also removes block-level Kramdown class annotations.
62 files, mostly 2018-2022 era content.
Converts <br> to <br /> and <hr> to <hr /> for MDX compatibility.
MDX requires XHTML-style self-closing void elements. 116 files,
403 replacements. Code fences are left untouched.
Converts patterns like <10Go, <0.05, <1.0km to &lt; entities so MDX
doesn't misinterpret them as JSX opening tags. 7 files affected.
Code fences are left untouched.
Replaces <https://...> autolinks with [url](url) format. MDX
misinterprets angle-bracket URLs as JSX opening tags. 15 files.
Code fences are left untouched.
Remove non-functional Gridsome import/link-box artifacts from three
GCC2013 pages, convert kramdown button syntax to HTML anchor tags in
eu/common/training.md, and replace kramdown Font Awesome patterns with
inline Lucide SVGs in four subsite footer files.
splitFile() was returning null for files lacking --- delimiters, causing
--all --check to silently skip them. Now treats the entire file as body
when no frontmatter is found, so the linter catches legacy patterns in
all markdown files. Also applies the normalizer to 11 newly-visible
files that had stale void elements, autolinks, or kramdown syntax.
With all legacy syntax now cleaned at the source level, these build-time
transforms are dead code: convertKramdownAttributes, stripVueArtifacts,
convertGridsomeSyntax, plus the autolink, void-element, and Vue-binding
sub-transforms inside convertVueToJsx. Removes the functions, their call
sites, exports, and corresponding test cases.
Adds a content:lint script that runs normalize-content.mjs --all --check,
which exits non-zero if any markdown file in content/ contains legacy
syntax (kramdown attributes, Gridsome tags, Vue artifacts, non-self-closed
void elements, bare < before digits, or markdown autolinks). Runs as a
parallel CI job so regressions are caught before merge.
The source-level normalization only handles <br> and <hr>, but MDX files
need all void elements (<img>, <input>, etc.) self-closed for JSX parsing.
Build failure confirmed this is still live code.
…nsforms

fixVoidElements now covers all 13 HTML void elements (was just br/hr),
matching the same set that convertVueToJsx handles at build time.
New fixUnquotedAttributes transform quotes bare numeric attribute values
like rowspan=3 → rowspan="3". Both wired into --all for CI linting.
dannon added 29 commits March 2, 2026 14:13
Wrapped bare text in card divs with p tags, converted markdown lists
and links inside Bootstrap card structures to HTML equivalents.
Removed div wrappers around floating images, moved float classes
directly onto img tags to eliminate div-with-markdown pattern.
Converted markdown links, bold, and reference-style links inside div
blocks to HTML equivalents for MDX compatibility.
Converted markdown links and images to HTML, replaced small tags with
em in pipe tables for MDX compatibility.
Convert <small> to <em> in pipe table cells, convert markdown links
to HTML <a> tags inside blockquote/div elements, and escape a bare
bracket that MDX would misparse as a component tag.
Convert <small> to <em> in pipe table cells and convert markdown
links to HTML <a> tags inside div elements.
Convert <small> to <em> in pipe table cells and convert markdown
links to HTML <a> tags inside blockquote/div elements.
Convert <small> to <em> in pipe table cells and convert markdown
links to HTML <a> tags inside div elements.
Converts Font Awesome <i class="fa fa-xxx"> tags, <a class="fa fa-xxx">
icon links, and kramdown [](){: .fa .fa-xxx} patterns to <Icon name="xxx" />
MDX components. Also converts frontmatter icon: fields from FA format
to Lucide names. Covers 69 FA→Lucide icon name mappings.
Applied --convert-fa-to-icon transform to 39 content files, replacing
all <i class="fa ..."> tags, <a class="fa ..."> icon links, and kramdown
FA patterns with <Icon name="xxx" /> MDX components. Also converted
FA icon references in home/education.yml to Lucide names.
Strip {: .small} kramdown attributes and flatten nested <li>/<ul>
structures that MDX can't parse.
Remove dead Liquid template tags ({% assign %}, {% include %}) that
break the acorn parser when these files get routed to MDX after the
FA→Icon conversion.
abbreviations-tag: put multi-line <code> block on single line with
JSX newline expressions to avoid MDX parse errors.
wids: remove dead {% twitter %} Liquid tags and markdown="1" attributes.
freiburg-galaxy-team-gcc: remove markdown="1" attributes, convert all
markdown inside div blocks to inline HTML, fix Liquid link syntax.
Convert markdown syntax inside HTML block elements to inline HTML so
MDX can parse these files now that they contain <Icon> components:
- chip: markdown list in alert div → <ul><li>
- collections: markdown list and --- hr in alert div → inline HTML
- g101: **bold** inside divs → <strong>
- nt-rnaseq: markdown lists in accordion panels → <ol>/<li>, backticks → <code>
- pt-gfam: flatten nested <li><ul> structures, escape * as &ast; in <code>
All FA icons in content have been converted to <Icon> MDX components
at the source level, so the build-time convertFontAwesomeToLucide()
transform is no longer needed. Removes ~150 lines of inline SVG path
data, the FA→Lucide mapping table, and 12 associated tests.
Remove source from the void elements list in the normalizer — while
it is technically an HTML void element (for media tags), our content
uses <source> extensively as a container element in Galaxy tool XML
documentation. The two actual HTML media <source> elements already
have proper self-closing syntax and aren't affected.
Makes it easy for contributors to run content linting locally with
`make content-lint`, as requested in PR review.
The bare less-than escaping (<0.05 → &lt;0.05) was overly aggressive
and not needed since these files aren't MDX. Removes the transform
function, CLI flag, and pipeline invocation.
These files have component tags and will be processed as MDX after the
upcoming switch to explicit opt-in. MDX can't parse HTML comments, so
converting them to JSX syntax ({/* */}) at the source level. The old
preprocessor did this at build time via convertHtmlToJsx() on every run,
but fixing it in source is cleaner and eliminates that build-time transform.

Also empties the notices.md insert since its entire content was commented
out (old maintenance banners from 2023-2024).
Added --sync-components-flag transform to normalize-content.mjs. It's a
bidirectional sync — adds components: true to files whose body contains
known component tags (Icon, VegaEmbed, Twitter, etc.) but lack the flag,
and removes the flag from files that have it but don't use any components.
Uses the existing outsideCodeFences() helper so component names inside
code blocks are ignored.

189 files gained the flag, 11 had it removed. This is a one-shot migration
but also works as a lint check going forward via npm run content:lint.
The preprocessor had ~150 lines of heuristic logic to decide whether
a content file becomes .md or .mdx: hasProblematicHtml() with 8
regex checks, needsVueProcessing() with a SAFE_COMPONENTS list for
auto-detection, and convertHtmlToJsx() for build-time comment/bracket
escaping. Now it's just: if components: true is in frontmatter, it's
MDX. Otherwise plain markdown.

Removes hasProblematicHtml, convertHtmlToJsx, SAFE_COMPONENTS, and
the build-time hasComponents frontmatter flag. The previous two
commits ensured all content files have the correct frontmatter flag
and source-level JSX comments, so the build-time transforms are no
longer needed.
Rewrites CONTRIBUTING.md, README.md, astro/README.md, and the Hub's
contributing and markdown tips pages to reflect the current Astro-based
site. Documents the components: true frontmatter convention, available
MDX components, MDX compatibility notes, and updated dev/build commands.
Removes references to Gridsome, metalsmith, yarn, and legacy helper
classes.
The MDX decision is already handled inline (frontmatter.components === true),
so the wrapper function, its export, and its tests were unused. Also removes
a leftover --escape-lt-digits reference from the normalize-content usage text.
…main2 comments

Use Makefile targets (make dev, make build) instead of raw npm commands in
README, CONTRIBUTING, and the contributing content page. Switch git clone
examples from SSH to HTTPS for lower barrier to entry. Revert JSX comments
in ifb/main2.md back to HTML comments since the file doesn't use components.
Also split event-specific frontmatter fields into their own table and use
more realistic example directory names in CONTRIBUTING.
Same bug as ifb/main2.md — JSX comment syntax in a file without
components: true would render as visible text. Also restore the
EU notices.md alert templates that the Freiburg team uses for
posting maintenance banners.
@dannon dannon force-pushed the content-migration branch from 67ecc95 to c55e525 Compare March 2, 2026 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants