Normalize content directory names by dannon · Pull Request #3815 · galaxyproject/galaxy-hub

dannon · 2026-03-07T14:30:28Z

Summary

Split out from #3804 — this is the content directory normalization work, separated from the subsite insert fix.

Drop letter↔digit split rules from slug normalization and rename camelCase/underscore content directories to kebab-case
Add CI lint check to catch non-normalized content directory names going forward

bgruening · 2026-03-09T21:25:01Z

I'm concerned we break links here or elsewhere. Can we dump the old sitemap somewhere and preserve it? Then use https://github.com/galaxyproject/galaxy-hub/blob/main/scripts/compare_sitemaps.py or something similar to test that the new page with all those changes here still serve all the old links and that the old sitemap is a subset of the new one? Does that make sense to you?

dannon · 2026-03-10T10:42:33Z

These should all be covered — the slug normalization system detects when a directory rename changes the URL and stores the original path as naturalSlug in frontmatter, which triggers redirect page generation at the old URLs. Those redirects also get baked as S3 object metadata (#3802) for native 301s, so old links keep working.

Can definitely do a sitemap comparison too if you want the extra confidence though.

bgruening · 2026-03-10T17:10:52Z

Can definitely do a sitemap comparison too if you want the extra confidence though.

I would feel better with that, also if we dump the sitemap of the old website somewhere before we take it offline, just to make sure we have some trace when we need to touch all of this for the next migration.

dannon · 2026-03-11T18:58:34Z

Pre-normalization sitemap snapshot

Built from main at 3954094 (pre-merge): https://gist.github.com/dannon/0bfc270430b48ba313e7083bf0a9573c

5396 URLs total.

Sitemap comparison (main → this branch)

After rebasing onto main and rebuilding, the post-normalization build also produces 5396 URLs. 780 URL paths changed between the two builds:

~130 from actual directory renames (camelCase/underscore → kebab-case) — these are covered by redirects in redirects.yaml and generated redirect HTML pages in public/
~650 from dropping the letter↔digit split rule in the slug normalizer (e.g., gcc-2012 → gcc2012) — these never existed on a deployed site, so no redirects needed

Every directory rename that was deployed has a corresponding redirect entry.

…se/underscore dirs only Removes the letter↔digit boundary rules from normalizeSlugSegment — they were splitting too many meaningful identifiers (gcc2026, orf3a, ga4gh, nsp2, etc.) into bad URL segments. camelCase and underscore→hyphen rules are kept. Adds mi-rna→mirna and ma-gs→mags slug overrides for the two bioinformatics terms that the camelCase rule still splits badly. 135 content directories renamed via git mv (parents before children). 135 redirect entries added to redirects.yaml covering all old paths. 129 collision cases skipped (both old and new name already exist separately). CloudFront function and test suite updated to match the simplified algorithm.

New script check-dir-names.mjs walks content/ and flags any directory whose name doesn't match its normalizeSlugSegment() form. Wired into npm run content:lint so CI catches newly added non-normalized dirs. content/.slug-bypass lists the 129 known collision paths that are acknowledged exceptions (both old and new-cased dirs exist on disk). Contributors can add their own bypass entries to suppress the check, which makes the exception explicit and reviewable.

…moval

Generates a lightweight slug-lookup file (404-lookup.json) at build time that maps skeleton keys (alphanumeric-only, lowercased paths) to canonical URLs and titles. The 404 page fetches this and tries to match the current URL, handling differences in casing, hyphens, underscores, and camelCase. Also pre-populates the search link with keywords extracted from the URL.

…ontent-dirs

…ing a suggestion link

bgruening

@dannon please merge when its green. I will hold of merging other stuff until then.

dannon · 2026-03-12T13:43:05Z

@bgruening Will do, I'll check out what just broke -- was definitely green earlier! :)

…-2026, egd-2025) and update test URLs to match

…G.md

dannon force-pushed the feature/normalize-content-dirs branch from ac10455 to 8719cb7 Compare March 9, 2026 19:26

dannon added 4 commits March 11, 2026 16:17

fix formatting and update test expectations for letter-digit split re…

7fb66a8

…moval

dannon force-pushed the feature/normalize-content-dirs branch from b8a739f to 7235ed1 Compare March 11, 2026 20:17

dannon added 4 commits March 11, 2026 16:25

fix prettier formatting in process-image-paths test

3df7d95

normalize MVD-workshop directory name from recently merged PR

e7560df

Merge remote-tracking branch 'upstream/main' into feature/normalize-c…

a083d13

…ontent-dirs

404 page: auto-redirect when a matching page is found instead of show…

cbe5a92

…ing a suggestion link

bgruening approved these changes Mar 12, 2026

View reviewed changes

dannon added 3 commits March 12, 2026 09:57

add redirects for letter-digit-split URLs that were live on main (gcc…

10b1505

…-2026, egd-2025) and update test URLs to match

document directory naming rules and slug normalization in CONTRIBUTIN…

9f0f179

…G.md

Formatting

5d4ab56

dannon enabled auto-merge March 12, 2026 15:04

dannon merged commit e273f58 into galaxyproject:main Mar 12, 2026
6 checks passed

This was referenced Mar 19, 2026

Import GTN News #3841

Closed

Fix GTN import: normalize slug to match renamed content dirs #3861

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize content directory names#3815

Normalize content directory names#3815
dannon merged 11 commits intogalaxyproject:mainfrom
dannon:feature/normalize-content-dirs

dannon commented Mar 7, 2026

Uh oh!

bgruening commented Mar 9, 2026

Uh oh!

dannon commented Mar 10, 2026

Uh oh!

bgruening commented Mar 10, 2026

Uh oh!

dannon commented Mar 11, 2026

Uh oh!

bgruening left a comment

Uh oh!

dannon commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dannon commented Mar 7, 2026

Summary

Uh oh!

bgruening commented Mar 9, 2026

Uh oh!

dannon commented Mar 10, 2026

Uh oh!

bgruening commented Mar 10, 2026

Uh oh!

dannon commented Mar 11, 2026

Pre-normalization sitemap snapshot

Sitemap comparison (main → this branch)

Uh oh!

bgruening left a comment

Choose a reason for hiding this comment

Uh oh!

dannon commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants