Normalize legacy content syntax at the source#3763
Normalize legacy content syntax at the source#3763dannon wants to merge 57 commits intogalaxyproject:mainfrom
Conversation
| cache-dependency-path: astro/package-lock.json | ||
|
|
||
| - name: Lint content | ||
| run: cd astro && npm ci && npm run content:lint |
There was a problem hiding this comment.
Create a make target? I think it is important that contributors can run all our testing/linting stuff very easily.
| email: name@example.com | ||
| url: https://example.com | ||
| authors: Jose and me | ||
| authors: [Jose and me] |
There was a problem hiding this comment.
this is deprecated syntax. It should fail to validate against your YAML schema.
| email: name@example.com | ||
| url: https://example.com | ||
| authors: Jose and me | ||
| authors: [Jose and me] |
There was a problem hiding this comment.
| authors: [Jose and me] | |
| contributions: | |
| authorship: | |
| - rmassei | |
| - bgruening | |
| funding: | |
| - eurosciencegateway |
content/admin/tools/data-managers/data-manager-xml-syntax/index.md
Outdated
Show resolved
Hide resolved
content/admin/tools/data-managers/data-manager-xml-syntax/index.md
Outdated
Show resolved
Hide resolved
| For more information about Laniakea, please subscribe to our <a href="http://srv00.area.ba.cnr.it/mailman/listinfo/galaxy.elixir-ita.cnr.it">mailing list</a>. | ||
|
|
||
| [Get Access](https://laniakea-elixir-it.github.io/laniakea_at_recas){: .btn .btn-primary .trim-p} | ||
| <a href="https://laniakea-elixir-it.github.io/laniakea_at_recas" class="btn btn-primary trim-p">Get Access</a> |
There was a problem hiding this comment.
should we have a better, reusable class name here?
There was a problem hiding this comment.
These are the existing Bootstrap class names — btn btn-primary is used across the site already. The kramdown syntax `{: .btn .btn-primary .trim-p}` was just a different way to spell the same `class="btn btn-primary trim-p"`. No new classes introduced, just a mechanical kramdown→HTML conversion.
There was a problem hiding this comment.
Do we want to have a class that renders a button and make that a documented best-practice?
There was a problem hiding this comment.
Exactly — the direction is to provide proper MDX components (Button, Card, Alert, etc.) that content authors use instead of raw Bootstrap HTML. We already have 13 MDX components for embeds and icons; extending that to layout/UI components is the natural next step.
#3602 tracks this — it proposes a hybrid approach:
- Keep
bootstrap-compat.cssfor existing content (already done) - Build an MDX component library for common patterns (Button, Card, CardDeck, Alert, LeadText)
- New content uses components; old content migrated gradually
- Eventually deprecate bootstrap-compat.css
So for the kramdown {:.btn .btn-primary} case specifically: going forward authors would write <Button href="...">Get Access</Button> with components: true in frontmatter. The existing Bootstrap classes in old content keep working via the compat layer.
content/0examples/non-vue/index.md
Outdated
| email: name@example.com | ||
| url: https://example.com | ||
| authors: Marzipan | ||
| authors: [Marzipan] |
There was a problem hiding this comment.
same here, replace with
contributions:
authorship:
- rmassei
- bgruening
funding:
- eurosciencegateway
or something similar
content/events/2025-05-08-galaxy-data-snacks-series-presentation/index.md
Show resolved
Hide resolved
|
Re: the review comment about ef1126a and b76a37c:
|
|
A prettier run and a rerun over the last merged to items is needed I think. |
b118323 to
86e6aad
Compare
content/hub/contributing/index.md
Outdated
| @@ -43,7 +43,7 @@ Then click "Propose new file" at the bottom. Then click the big, green "Create p | |||
|
|
|||
| For users familiar with git and the command line, you can clone the repository locally. You should probably first fork the repository: on the [repo page](https://github.com/galaxyproject/galaxy-hub), click the "Fork" button at the upper right of the page. Then make sure it shows your username as the owner (or whoever you want the owner to be), and the rest of the options you can leave as their defaults. Then click "Create fork". Then it'll take you to the page for your fork. Now you can clone it by clicking the big green "Code" button at the upper right. You'll probably want the "SSH" option, so copy the address (which should look like `git@github.com:[your-username]/galaxy-hub.git` but with your username instead of `[your-username]`) and paste it into a git clone command: | |||
| ```sh | |||
| $ git clone --recursive 'git@github.com:[your-username]/galaxy-hub.git' | |||
| $ git clone 'git@github.com:[your-username]/galaxy-hub.git' | |||
There was a problem hiding this comment.
| $ git clone 'git@github.com:[your-username]/galaxy-hub.git' | |
| $ git clone https://github.com/galaxyproject/galaxy-hub.git |
content/hub/contributing/index.md
Outdated
| ```sh | ||
| $ yarn develop | ||
| $ cd astro |
There was a problem hiding this comment.
Let use the makefile targets here?
content/ifb/main2.md
Outdated
| @@ -3,17 +3,17 @@ title: Galaxy France | |||
| subtitle: The homepage of the French Galaxy community | |||
| --- | |||
|
|
|||
| <!-- Color palette: https://www.color-hex.com/color-palette/9983 --> | |||
| <!-- Projects/Communities/Citation/Team cards --> | |||
| {/* Color palette: https://www.color-hex.com/color-palette/9983 */} | |||
There was a problem hiding this comment.
There is no component: true in the frontmatter, why this MDX syntax?
| --- | ||
|
|
||
| <div class="right"><i class="fa fa-wrench fa-5x text-danger" aria-hidden="true"></i></div> | ||
| <div class="right"><Icon name="wrench" /></div> |
There was a problem hiding this comment.
Why are we doing it this way, the old way seemed to be better. Or at least we could define a class called wrench <i class="wrench"> is not a possibility? Even copying the 🔧 direclty in, seems to me better in those cases than force people to use components: true with all its quirks.
| </tr> | ||
| </tbody> | ||
| </table> | ||
| | Date | Topic/Event | Venue/Location | Contact | |
There was a problem hiding this comment.
why did we here converted html to markdown table?
| {% twitter https://twitter.com/denbiOffice/status/1405894946809384963 %} | ||
| {% twitter https://twitter.com/denbiOffice/status/1405900003462324238 %} | ||
| {% twitter https://twitter.com/denbiOffice/status/1405904529753886730 %} | ||
| <div class="col-md-6"> |
There was a problem hiding this comment.
twitter links are missing, which I personally don't mind if we are not linking out to this site anymore.
I guess you wanted to use <Twitter here
CONTRIBUTING.md
Outdated
| Common fields: | ||
|
|
||
| There are general tags that can be used on any page, and there are also specific sets of tags that can be used on specific sets of pages. | ||
| | Field | Used in | Description | |
There was a problem hiding this comment.
those are only true for events, not news.
CONTRIBUTING.md
Outdated
| For previewing changes beyond what GitHub's preview offers: | ||
|
|
||
| ```bash | ||
| git clone git@github.com:your-username/galaxy-hub.git |
There was a problem hiding this comment.
| git clone git@github.com:your-username/galaxy-hub.git | |
| git clone https://github.com/galaxyproject/galaxy-hub.git |
CONTRIBUTING.md
Outdated
|
|
||
| ```bash | ||
| git clone git@github.com:your-username/galaxy-hub.git | ||
| cd galaxy-hub/astro |
There was a problem hiding this comment.
| cd galaxy-hub/astro | |
| cd galaxy-hub |
CONTRIBUTING.md
Outdated
| npm install | ||
| npm run dev |
| 1. Structuring Content, Files, and URLs | ||
| 1. [Markdown](content/hub/contributing/markdown/index.md) - How to format paragraphs and images | ||
| 2. [File Organization and URLs](content/hub/contributing/file-organization/index.md) - How to name files and structure URL paths | ||
| 3. [Static and Dynamic Pages](content/hub/contributing/file-organization/index.md#static-pages) - How to display list of pages, custom layouts, and categorized pages |
There was a problem hiding this comment.
Those links are useful, isn't it? We should bring them back.
Standalone script that applies source-level transformations to /content/ markdown files, one transform type at a time. Designed for incremental migration of Gridsome/Jekyll syntax to standard HTML/markdown. Supports --check for CI dry-run mode.
The layout field was a Gridsome artifact that's no longer used. Astro determines layouts via content collections and page templates instead. 16 files across news, use, and careers content.
Converts bare scalar authors values to single-element arrays so they're consistent with the rest of the content. Only 2 example files affected. Also fixes a bug in the normalize script regex that was matching across newlines into block sequences.
Removes Vue import statements and converts :prop="value" bindings to standard prop="value" attributes. Mostly GCC2013 content that had Gridsome-era Vue component usage. 25 files.
Replaces <g-image> with <img> tags. Only 5 files still had Gridsome component syntax in their markdown.
Converts [text](url){:target="_blank"} and similar Kramdown inline
attribute syntax to proper <a> tags with the attributes directly on
the element. Also removes block-level Kramdown class annotations.
62 files, mostly 2018-2022 era content.
Converts <br> to <br /> and <hr> to <hr /> for MDX compatibility. MDX requires XHTML-style self-closing void elements. 116 files, 403 replacements. Code fences are left untouched.
Converts patterns like <10Go, <0.05, <1.0km to < entities so MDX doesn't misinterpret them as JSX opening tags. 7 files affected. Code fences are left untouched.
Replaces <https://...> autolinks with [url](url) format. MDX misinterprets angle-bracket URLs as JSX opening tags. 15 files. Code fences are left untouched.
Remove non-functional Gridsome import/link-box artifacts from three GCC2013 pages, convert kramdown button syntax to HTML anchor tags in eu/common/training.md, and replace kramdown Font Awesome patterns with inline Lucide SVGs in four subsite footer files.
splitFile() was returning null for files lacking --- delimiters, causing --all --check to silently skip them. Now treats the entire file as body when no frontmatter is found, so the linter catches legacy patterns in all markdown files. Also applies the normalizer to 11 newly-visible files that had stale void elements, autolinks, or kramdown syntax.
With all legacy syntax now cleaned at the source level, these build-time transforms are dead code: convertKramdownAttributes, stripVueArtifacts, convertGridsomeSyntax, plus the autolink, void-element, and Vue-binding sub-transforms inside convertVueToJsx. Removes the functions, their call sites, exports, and corresponding test cases.
Adds a content:lint script that runs normalize-content.mjs --all --check, which exits non-zero if any markdown file in content/ contains legacy syntax (kramdown attributes, Gridsome tags, Vue artifacts, non-self-closed void elements, bare < before digits, or markdown autolinks). Runs as a parallel CI job so regressions are caught before merge.
The source-level normalization only handles <br> and <hr>, but MDX files need all void elements (<img>, <input>, etc.) self-closed for JSX parsing. Build failure confirmed this is still live code.
…nsforms fixVoidElements now covers all 13 HTML void elements (was just br/hr), matching the same set that convertVueToJsx handles at build time. New fixUnquotedAttributes transform quotes bare numeric attribute values like rowspan=3 → rowspan="3". Both wired into --all for CI linting.
Wrapped bare text in card divs with p tags, converted markdown lists and links inside Bootstrap card structures to HTML equivalents.
Removed div wrappers around floating images, moved float classes directly onto img tags to eliminate div-with-markdown pattern.
Converted markdown links, bold, and reference-style links inside div blocks to HTML equivalents for MDX compatibility.
Converted markdown links and images to HTML, replaced small tags with em in pipe tables for MDX compatibility.
Convert <small> to <em> in pipe table cells, convert markdown links to HTML <a> tags inside blockquote/div elements, and escape a bare bracket that MDX would misparse as a component tag.
Convert <small> to <em> in pipe table cells and convert markdown links to HTML <a> tags inside div elements.
Convert <small> to <em> in pipe table cells and convert markdown links to HTML <a> tags inside blockquote/div elements.
Convert <small> to <em> in pipe table cells and convert markdown links to HTML <a> tags inside div elements.
Converts Font Awesome <i class="fa fa-xxx"> tags, <a class="fa fa-xxx">
icon links, and kramdown [](){: .fa .fa-xxx} patterns to <Icon name="xxx" />
MDX components. Also converts frontmatter icon: fields from FA format
to Lucide names. Covers 69 FA→Lucide icon name mappings.
Applied --convert-fa-to-icon transform to 39 content files, replacing all <i class="fa ..."> tags, <a class="fa ..."> icon links, and kramdown FA patterns with <Icon name="xxx" /> MDX components. Also converted FA icon references in home/education.yml to Lucide names.
Strip {: .small} kramdown attributes and flatten nested <li>/<ul>
structures that MDX can't parse.
Remove dead Liquid template tags ({% assign %}, {% include %}) that
break the acorn parser when these files get routed to MDX after the
FA→Icon conversion.
abbreviations-tag: put multi-line <code> block on single line with
JSX newline expressions to avoid MDX parse errors.
wids: remove dead {% twitter %} Liquid tags and markdown="1" attributes.
freiburg-galaxy-team-gcc: remove markdown="1" attributes, convert all
markdown inside div blocks to inline HTML, fix Liquid link syntax.
Convert markdown syntax inside HTML block elements to inline HTML so MDX can parse these files now that they contain <Icon> components: - chip: markdown list in alert div → <ul><li> - collections: markdown list and --- hr in alert div → inline HTML - g101: **bold** inside divs → <strong> - nt-rnaseq: markdown lists in accordion panels → <ol>/<li>, backticks → <code> - pt-gfam: flatten nested <li><ul> structures, escape * as * in <code>
All FA icons in content have been converted to <Icon> MDX components at the source level, so the build-time convertFontAwesomeToLucide() transform is no longer needed. Removes ~150 lines of inline SVG path data, the FA→Lucide mapping table, and 12 associated tests.
Remove source from the void elements list in the normalizer — while it is technically an HTML void element (for media tags), our content uses <source> extensively as a container element in Galaxy tool XML documentation. The two actual HTML media <source> elements already have proper self-closing syntax and aren't affected.
Makes it easy for contributors to run content linting locally with `make content-lint`, as requested in PR review.
This reverts commit ef1126a.
The bare less-than escaping (<0.05 → <0.05) was overly aggressive and not needed since these files aren't MDX. Removes the transform function, CLI flag, and pipeline invocation.
These files have component tags and will be processed as MDX after the
upcoming switch to explicit opt-in. MDX can't parse HTML comments, so
converting them to JSX syntax ({/* */}) at the source level. The old
preprocessor did this at build time via convertHtmlToJsx() on every run,
but fixing it in source is cleaner and eliminates that build-time transform.
Also empties the notices.md insert since its entire content was commented
out (old maintenance banners from 2023-2024).
Added --sync-components-flag transform to normalize-content.mjs. It's a bidirectional sync — adds components: true to files whose body contains known component tags (Icon, VegaEmbed, Twitter, etc.) but lack the flag, and removes the flag from files that have it but don't use any components. Uses the existing outsideCodeFences() helper so component names inside code blocks are ignored. 189 files gained the flag, 11 had it removed. This is a one-shot migration but also works as a lint check going forward via npm run content:lint.
The preprocessor had ~150 lines of heuristic logic to decide whether a content file becomes .md or .mdx: hasProblematicHtml() with 8 regex checks, needsVueProcessing() with a SAFE_COMPONENTS list for auto-detection, and convertHtmlToJsx() for build-time comment/bracket escaping. Now it's just: if components: true is in frontmatter, it's MDX. Otherwise plain markdown. Removes hasProblematicHtml, convertHtmlToJsx, SAFE_COMPONENTS, and the build-time hasComponents frontmatter flag. The previous two commits ensured all content files have the correct frontmatter flag and source-level JSX comments, so the build-time transforms are no longer needed.
Rewrites CONTRIBUTING.md, README.md, astro/README.md, and the Hub's contributing and markdown tips pages to reflect the current Astro-based site. Documents the components: true frontmatter convention, available MDX components, MDX compatibility notes, and updated dev/build commands. Removes references to Gridsome, metalsmith, yarn, and legacy helper classes.
The MDX decision is already handled inline (frontmatter.components === true), so the wrapper function, its export, and its tests were unused. Also removes a leftover --escape-lt-digits reference from the normalize-content usage text.
…main2 comments Use Makefile targets (make dev, make build) instead of raw npm commands in README, CONTRIBUTING, and the contributing content page. Switch git clone examples from SSH to HTTPS for lower barrier to entry. Revert JSX comments in ifb/main2.md back to HTML comments since the file doesn't use components. Also split event-specific frontmatter fields into their own table and use more realistic example directory names in CONTRIBUTING.
Same bug as ifb/main2.md — JSX comment syntax in a file without components: true would render as visible text. Also restore the EU notices.md alert templates that the Freiburg team uses for posting maintenance banners.
67ecc95 to
c55e525
Compare
Summary
The Astro preprocessor currently applies ~10 syntax transforms to every content file on every build — converting Gridsome tags, kramdown attributes, Font Awesome icons, etc. This means the build is doing work that should have been done once, and contributors can't see the "real" content in the source files.
This PR commits those transforms to the content files themselves and removes the corresponding build-time code. The preprocessor shrinks by ~220 lines, and a new CI lint (
npm run content:lint) prevents legacy syntax from being reintroduced.What changed in content (~780 files)
All changes are mechanical, applied via
normalize-content.mjs:<g-link>/<g-image>→ standard HTML, stripped Vue imports and:bindingsyntax{: .class}converted toclass="...", block-level annotations ({:.table.table-striped}) removed (no standard markdown equivalent — see discussion in comments)<br>→<br />), bare numeric attributes quoted, autolinks converted, component names PascalCased<i>tags, icon links, and kramdown FA patterns converted to<Icon name="..." />MDX components (~40 files). NewIcon.astrocomponent useslucide-static— renders at build time, no client JSlayout:fields, normalized scalar tags/subsites/authors to arrays, replaced deprecatedauthorswithcontributionsin example filescomponents: trueto ~190 files that use MDX components, removed it from ~15 files that had the flag but no componentsWhat changed in the build pipeline
hasProblematicHtml(),convertHtmlToJsx(),needsVueProcessing()auto-detection,SAFE_COMPONENTSlist,convertFontAwesomeToLucide()and its FA→Lucide mapping table. MDX decision is now justfrontmatter.components === true.normalize-content.mjswith--checkmode for CI,content:lintnpm script, CI workflow job, andmake content-linttargetMDX is now explicit opt-in
Previously the preprocessor scanned content for known component tags and checked for "problematic HTML" that would break MDX parsing. Files with components but problematic HTML got silently downgraded to
.mdwith no warning. Now authors addcomponents: trueto frontmatter, andcontent:lintcatches mismatches. No auto-detection, no build-time content rewriting.