Skip to content

Normalize HTML anchor names for citations#95

Merged
goerz merged 3 commits intomasterfrom
86-anchor-escape
May 26, 2025
Merged

Normalize HTML anchor names for citations#95
goerz merged 3 commits intomasterfrom
86-anchor-escape

Conversation

@goerz
Copy link
Member

@goerz goerz commented May 25, 2025

The HTML anchor name is now derived from the citation key via a normalization to ASCII alphanumeric characters and _ and -.

Closes #86

@goerz goerz added bug Something isn't working enhancement New feature or request labels May 25, 2025
@goerz goerz force-pushed the 86-anchor-escape branch 3 times, most recently from 2dd5a74 to 2ca1666 Compare May 25, 2025 04:01
The HTML anchor name is now derived from the citation key via a
normalization to ASCII alphanumeric characters and `_` and `-`.
@goerz goerz force-pushed the 86-anchor-escape branch from 2ca1666 to bb4cfb2 Compare May 25, 2025 04:04
@goerz
Copy link
Member Author

goerz commented May 25, 2025

@kellertuer Any comments? Otherwise, I'll merge this and release as 1.4.0 in the near future

@kellertuer
Copy link
Contributor

This already looks very good! Thanks for working on this!

The only thing I am not so sure, is, what happens if two keys normalise to the same normalised key, is this “caught” and avoided? Sure, this is a rare edge case, but what if we have

ABC:2 and ABC:2å or so that they both normalise to ABC2. Are these still distinguishable?
If not, I am not so sure how to avoid that, and maybe Bijections already does this?
The simplest would maybe be:
If the normalised jet already exist, try to add an integer k=1,2,3,4,... until the key does not exist, so maybe at the point of adding this, it is also just a few lines to add.

Again, thanks for working on this!

@goerz
Copy link
Member Author

goerz commented May 25, 2025

what happens if two keys normalise to the same normalised key, is this “caught” and avoided?

Yes, that gets caught and results in an error:

msg = "Cannot generate HTML anchor for citation key \"AbsilMahonySepulchre.2008\": normalizes to ambiguous \"AbsilMahonySepulchre2008\" conflicting with citation key \"AbsilMahonySepulchre:2008\""

If the normalised jet already exist, try to add an integer k=1,2,3,4,... until the key does not exist

Actually, good point! I can turn that error into a warning and add suffixes. No good reason to confront people with recoverable errors! Thanks for the suggestion!

ABC:2 and ABC:2å or so that they both normalise to ABC2

Actually, these normalize to ABC2 and ABC2a, respectively: stripping of diacritics (NFKD normalization) is part of the conversion. But the point still stands.

I'm having second thoughts about stripping . too (I use it myself, mostly for arXiv preprints). Yes, it creates an issue with CSS selectors (it would have to be escaped), but how many people are going to be writing a CSS selector for a particular bibliography item in the documentation? It's probably better not to change anchors too much, which could be unnecessarily disruptive for existing users. Unfortunately, : in particular is incompatible with Documenter.DOM at the moment. (Edit: . is also fundamentally incompatible with Documenter.DOM.)

I'm also considering whether I should replace disallowed punctuation (like :) with _, instead of just dropping it. That might result in more readable anchors. Not that the anchors are extremely user-facing, but still.

@goerz goerz force-pushed the 86-anchor-escape branch from ab71621 to d3e77d0 Compare May 25, 2025 16:36
@goerz goerz merged commit 52aa72b into master May 26, 2025
8 checks passed
@goerz goerz deleted the 86-anchor-escape branch May 26, 2025 19:49
@goerz
Copy link
Member Author

goerz commented May 27, 2025

I'm holding off on putting out a release until the potential reorganization of Bijections.jl is done, see JuliaCollections/Bijections.jl#19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Escape special characters in keys when using as anchor

2 participants