Skip to content

Commit bb4cfb2

Browse files
committed
Normalize HTML anchor names for citations
The HTML anchor name is now derived from the citation key via a normalization to ASCII alphanumeric characters and `_` and `-`.
1 parent 63cdf8e commit bb4cfb2

File tree

14 files changed

+1178
-19
lines changed

14 files changed

+1178
-19
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ jobs:
8282
Pkg.PackageSpec(name="Documenter", version="1.0.0"),
8383
Pkg.PackageSpec(name="MarkdownAST", version="0.1.2"),
8484
Pkg.PackageSpec(name="OrderedCollections", version="1.6.0"),
85+
Pkg.PackageSpec(name="Bijections", version="0.1.4"),
8586
])
8687
Pkg.precompile()
8788
Pkg.status()

NEWS.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
66

77
## [Unreleased][]
88

9+
### Added
10+
11+
* The `CitationBibliography` plugin object now has an internal field `anchor_keys` that is a bijective mapping of citation keys to HTML anchor names. The anchor names are normalized versions of the citation keys that are restricted to ASCII alphanumerics, dashes (`-`) and underscores (`_`). This provides [compatibility with HTML4](https://www.w3.org/TR/html4/types.html#type-id) and additionally [avoids issues with CSS selectors](https://stackoverflow.com/a/79022). It also works around restrictions of the `Documenter.DOM` framework that is used internally to render HTML content. [[#95][]]
12+
13+
14+
### Fixed
15+
16+
* Citation keys the contain special characters (like colons) no longer produce broken links. This is achieved by normalizing HTML anchor names to contain only alphanumeric ASCII characters, dashes, and underscores [[#86][], [#95][]]
17+
918

1019
## [Version 1.3.7][1.3.7] - 2025-03-29
1120

@@ -198,8 +207,10 @@ There were several bugs and limitations in version `1.2.x` for which some existi
198207
[1.2.0]: https://github.com/JuliaDocs/DocumenterCitations.jl/compare/v1.1.0...v1.2.0
199208
[1.1.0]: https://github.com/JuliaDocs/DocumenterCitations.jl/compare/v1.0.0...v1.1.0
200209
[1.0.0]: https://github.com/JuliaDocs/DocumenterCitations.jl/compare/v0.2.12...v1.0.0
210+
[#95]: https://github.com/JuliaDocs/DocumenterCitations.jl/pull/95
201211
[#89]: https://github.com/JuliaDocs/DocumenterCitations.jl/pull/89
202212
[#87]: https://github.com/JuliaDocs/DocumenterCitations.jl/pull/87
213+
[#86]: https://github.com/JuliaDocs/DocumenterCitations.jl/issues/86
203214
[#83]: https://github.com/JuliaDocs/DocumenterCitations.jl/pull/83
204215
[#80]: https://github.com/JuliaDocs/DocumenterCitations.jl/issues/80
205216
[#79]: https://github.com/JuliaDocs/DocumenterCitations.jl/pull/79

Project.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ version = "1.3.7+dev"
66
[deps]
77
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
88
Bibliography = "f1be7e48-bf82-45af-a471-ae754a193061"
9+
Bijections = "e2ed5e7c-b2de-5872-ae92-c73ca462fb04"
910
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
1011
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
1112
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
@@ -17,6 +18,7 @@ Unicode = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
1718
[compat]
1819
AbstractTrees = "0.4"
1920
Bibliography = "0.2.15, 0.3"
21+
Bijections = "0.1.4"
2022
Dates = "1"
2123
Documenter = "1"
2224
Logging = "1"

src/DocumenterCitations.jl

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ using Documenter.Writers.HTMLWriter
99
import MarkdownAST
1010
import AbstractTrees
1111

12+
using Bijections: Bijections
1213
using Logging
1314
using Markdown
1415
using Bibliography: Bibliography, xyear, xlink, xtitle
@@ -48,6 +49,11 @@ should not be considered part of the stable API.
4849
* `anchor_map`: an [`AnchorMap`](https://documenter.juliadocs.org/stable/lib/internals/anchors/#Documenter.AnchorMap)
4950
object that keeps track of the link anchors for references in bibliography
5051
blocks
52+
* `anchor_keys`: a [bijective map](https://github.com/scheinerman/Bijections.jl?tab=readme-ov-file#bijections)
53+
of citation keys to HTML anchor names. Whenever possible, an anchor name is
54+
identical to the citation key, but anchor names are restricted to consist
55+
only of ASCII letters, digits, and the symbols `-`, `_`. Thus, citation keys
56+
are normalized to meet that restriction.
5157
"""
5258
struct CitationBibliography <: Documenter.Plugin
5359

@@ -72,6 +78,9 @@ struct CitationBibliography <: Documenter.Plugin
7278
# canonical bibliography blocks
7379
anchor_map::Documenter.AnchorMap
7480

81+
# Map citation key => anchor name
82+
anchor_keys::Bijections.Bijection{String,String}
83+
7584
end
7685

7786
function CitationBibliography(bibfile::AbstractString=""; style=nothing)
@@ -117,13 +126,15 @@ function CitationBibliography(bibfile::AbstractString=""; style=nothing)
117126
citations = OrderedDict{String,Int64}()
118127
page_citations = Dict{String,Set{String}}()
119128
anchor_map = Documenter.AnchorMap()
129+
anchor_keys = Bijections.Bijection{String,String}()
120130
return CitationBibliography(
121131
bibfile,
122132
style,
123133
entries,
124134
citations,
125135
page_citations,
126-
anchor_map
136+
anchor_map,
137+
anchor_keys
127138
)
128139
end
129140

src/expand_bibliography.jl

Lines changed: 66 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ _ALLOW_PRE_13_FALLBACK = true
44
55
Runs after [`CollectCitations`](@ref) but before [`ExpandCitations`](@ref).
66
7-
Each bibliography is rendered into HTML as a a [definition
7+
Each bibliography is rendered into HTML as a [definition
88
list](https://www.w3schools.com/tags/tag_dl.asp), a [bullet
99
list](https://www.w3schools.com/tags/tag_ul.asp), or an
1010
[enumeration](https://www.w3schools.com/tags/tag_ol.asp) depending on
@@ -362,18 +362,24 @@ function expand_bibliography(node::MarkdownAST.Node, meta, page, doc)
362362
end
363363
for (key, entry) in entries_to_show
364364
if fields[:Canonical]
365-
anchor_key = key
365+
try
366+
anchor_key = get_anchor_key(key, bib.anchor_keys)
367+
catch exception
368+
@error "Cannot generate anchor for $(repr(key)) on page $(warn_loc)" exception
369+
push!(doc.internal.errors, :bibliography_block)
370+
continue # skip entry
371+
end
366372
# Add anchor that citations can link to from anywhere in the docs.
367-
if Documenter.anchor_exists(anchors, key)
373+
if Documenter.anchor_exists(anchors, anchor_key)
368374
# Skip entries that already have a canonical bib entry
369375
# elsewhere. This is expected behavior, not an error/warning,
370376
# allowing to split the canonical bibliography in multiple
371377
# parts.
372378
@debug "Skipping key=$(key) (existing anchor)"
373379
continue
374380
else
375-
@debug "Defining anchor for key=$(key)"
376-
Documenter.anchor_add!(anchors, entry, key, page.build)
381+
@debug "Defining anchor $(repr(anchor_key)) for key=$(repr(key))"
382+
Documenter.anchor_add!(anchors, entry, anchor_key, page.build)
377383
end
378384
else
379385
anchor_key = nothing
@@ -406,6 +412,61 @@ function expand_bibliography(node::MarkdownAST.Node, meta, page, doc)
406412
end
407413

408414

415+
# Generate a suitably normalized (restricted ASCII) HTML anchor name from a
416+
# citation key.
417+
#
418+
# The [HTML4 standard requires](https://www.w3.org/TR/html4/types.html#type-id)
419+
# that anchor names must begin with a letter ([A-Za-z]) and may be followed by
420+
# any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"),
421+
# colons (":"), and periods ("."). Dots and colons are further problematic for
422+
# compatibility with CSS selectors, see https://stackoverflow.com/a/79022.
423+
# Even more importantly, these characters are not supported by the
424+
# `Documenter.DOM` framework that we use to generate HTML: it will silently
425+
# drop anything after a colon or period.
426+
function get_anchor_key(citation_key::String, cache::Bijections.Bijection{String,String})
427+
if haskey(cache, citation_key)
428+
anchor_key = cache[citation_key]
429+
else
430+
anchor_key = normalize_anchor(citation_key) # => [A-Za-z0-0_-]
431+
if !startswith(anchor_key, r"[A-Za-z]")
432+
# Anchors must start with a letter. Instead of rejecting "invalid"
433+
# anchors, we just prepend something arbitrary.
434+
anchor_key = "cit-" * anchor_key
435+
end
436+
try
437+
# The Bijection type takes care of all the work of checking for
438+
# duplicates here.
439+
cache[citation_key] = anchor_key
440+
catch
441+
msg = "Cannot generate HTML anchor for citation key $(repr(citation_key)): normalizes to ambiguous $(repr(anchor_key)) conflicting with citation key $(repr(cache(anchor_key)))"
442+
error(msg)
443+
end
444+
@debug "Generated anchor key $(repr(anchor_key)) for citation key $(repr(citation_key))"
445+
end
446+
return anchor_key
447+
end
448+
449+
450+
# Transform an arbitrary string `s` into a normalized string containing only
451+
# ASCII letters, numbers, and the symbols `_` and `-`, i.e., matching the regex
452+
# `r"^[A-Za-z0-9_-]+$"`. Letters with diacritics are normalized into their
453+
# ASCII equivalents, and all other characters are dropped.
454+
function normalize_anchor(s::AbstractString)
455+
s_norm = Unicode.normalize(s, :NFKD) # decompose diacritics
456+
chars = Char[]
457+
for c in s_norm
458+
if ('A' <= c <= 'Z') ||
459+
('a' <= c <= 'z') ||
460+
('0' <= c <= '9') ||
461+
c == '_' ||
462+
c == '-'
463+
push!(chars, c)
464+
end
465+
end
466+
return String(chars)
467+
end
468+
469+
409470
# Deal with `@__FILE__` in `Pages`, convert it to the name of the current file.
410471
function _resolve__FILE__(Pages, page)
411472
__FILE__ = let ex = Meta.parse("_ = @__FILE__", 1; raise=false)[1]

src/expand_citations.jl

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -133,18 +133,19 @@ function expand_citation(
133133
@assert cit isa DirectCitationLink
134134
# E.g., "[Semi-AD paper](@cite GoerzQ2022)"
135135
key = cit.key
136-
anchor = Documenter.anchor(anchors, key)
137-
if isnothing(anchor)
138-
link_text = ast_linktext(cit.node)
139-
@error "expand_citation$rec: No destination for key=$(repr(key)) → unlinked text $(repr(link_text))"
140-
return Documenter.mdparse(link_text; mode=:span)
141-
else
136+
if haskey(bib.anchor_keys, key)
137+
anchor_key = bib.anchor_keys[key]
138+
anchor = Documenter.anchor(anchors, anchor_key)
142139
expanded_node = MarkdownAST.copy_tree(node)
143140
path = relpath(anchor.file, dirname(page.build))
144141
expanded_node.element.destination =
145142
string(path, Documenter.anchor_fragment(anchor))
146143
@debug "expand_citation$rec: $cit → link to $(expanded_node.element.destination)"
147144
return expanded_node
145+
else
146+
link_text = ast_linktext(cit.node)
147+
@error "expand_citation$rec: No destination for key=$(repr(key)) → unlinked text $(repr(link_text))"
148+
return Documenter.mdparse(link_text; mode=:span)
148149
end
149150
end
150151
end

test/Project.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
[deps]
22
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
33
Bibliography = "f1be7e48-bf82-45af-a471-ae754a193061"
4+
Bijections = "e2ed5e7c-b2de-5872-ae92-c73ca462fb04"
45
Coverage = "a2441757-f6aa-5fb2-8edb-039e3f45d037"
56
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
67
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"

test/runtests.jl

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,11 @@ using DocumenterCitations
5656
include("test_keys_with_underscores.jl")
5757
end
5858

59+
println("\n* anchor_keys (test_anchor_keys.jl):")
60+
@time @safetestset "anchor_keys" begin
61+
include("test_anchor_keys.jl")
62+
end
63+
5964
println("\n* integration test (test_integration.jl):")
6065
@time @safetestset "integration" begin
6166
include("test_integration.jl")

test/test_anchor_keys.jl

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
using DocumenterCitations
2+
using DocumenterCitations: get_anchor_key
3+
using Test
4+
using Bijections
5+
using TestingUtilities: @Test # much better at comparing strings
6+
using IOCapture: IOCapture
7+
8+
include("run_makedocs.jl")
9+
10+
@testset "anchor key ambiguity" begin
11+
12+
cache = Bijections.Bijection{String,String}()
13+
14+
anchor_key = get_anchor_key("AbsilMahonySepulchre:2008", cache)
15+
@test anchor_key == "AbsilMahonySepulchre2008"
16+
17+
# cache hit
18+
@test cache("AbsilMahonySepulchre2008") == "AbsilMahonySepulchre:2008"
19+
anchor_key = get_anchor_key("AbsilMahonySepulchre:2008", cache)
20+
@test anchor_key == "AbsilMahonySepulchre2008"
21+
22+
anchor_key = get_anchor_key("2008_AbsilMahonySepulchre", cache)
23+
@test anchor_key == "cit-2008_AbsilMahonySepulchre"
24+
25+
c = IOCapture.capture(rethrow=Union{}) do
26+
get_anchor_key("AbsilMahonySepulchre.2008", cache)
27+
end
28+
@test c.value isa ErrorException
29+
msg = "Cannot generate HTML anchor for citation key \"AbsilMahonySepulchre.2008\": normalizes to ambiguous \"AbsilMahonySepulchre2008\" conflicting with citation key \"AbsilMahonySepulchre:2008\""
30+
@Test c.value.msg == msg
31+
32+
33+
end
34+
35+
36+
@testset "keys with symbols" begin
37+
38+
# https://github.com/JuliaDocs/DocumenterCitations.jl/issues/86
39+
40+
bib = CitationBibliography(
41+
joinpath(@__DIR__, "test_anchor_keys", "src", "refs.bib"),
42+
style=:numeric
43+
)
44+
45+
run_makedocs(
46+
joinpath(@__DIR__, "test_anchor_keys");
47+
sitename="Test",
48+
plugins=[bib],
49+
pages=["Home" => "index.md", "References" => "references.md",],
50+
warnonly=true,
51+
check_success=true
52+
) do dir, result, success, backtrace, output
53+
54+
@test success
55+
56+
@test bib.anchor_keys["Chirikjian:2012"] == "Chirikjian2012"
57+
58+
@test contains(output, "Error: Cannot generate anchor for \"Chirikjian2012\"")
59+
@test contains(output, "normalizes to ambiguous \"Chirikjian2012\"")
60+
61+
#! format: off
62+
index_html = read(joinpath(dir, "build", "index.html"), String)
63+
@Test contains(index_html, "<a href=\"references/#Chirikjian2012\">")
64+
65+
references_html = read(joinpath(dir, "build", "references", "index.html"), String)
66+
@Test contains(references_html, "<div id=\"Chirikjian2012\">")
67+
#! format: on
68+
69+
end
70+
71+
end

test/test_anchor_keys/src/index.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Testing citation keys with special symbols
2+
3+
You can read more about the theory of Lie groups for example in [Chirikjian:2012](@cite).
4+
5+
Note the ambiguous citations keys, as for Ref. [Chirikjian2012](@cite) leads to errors.

0 commit comments

Comments
 (0)