Skip to content

Commit b630cdf

Browse files
authored
fix: strip Scroll to Text Fragment anchors in link checks (#86)
## Summary - Adds remap rules to strip [Scroll to Text Fragment](https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments) anchors (`#:~:text=...`) from GitHub blob URLs before lychee tries to verify them - Applied to both repo-specific (`build_remap_args`) and global (`build_global_github_args`) remaps ## Context Scroll to Text Fragment anchors are a browser-only feature — they highlight text on the page but don't correspond to any HTML element ID. Lychee can't verify them in static HTML. Without this fix, the "other fragments → raw.githubusercontent.com" remap rule catches these URLs first and redirects them to raw content, where fragment validation fails because `#:~:text=extendedAgent` isn't a valid anchor. The consuming repo's `lychee.toml` already has a remap to strip this specific fragment, but it never fires because CLI `--remap` args from flint match first (first-match-wins). Observed in [open-telemetry/opentelemetry-java-instrumentation CI](https://github.com/open-telemetry/opentelemetry-java-instrumentation/actions/runs/22560996279/job/65347333881) — the link works locally because `build_remap_args` skips when on the base branch. ## Test plan - [ ] Verify CI passes on this PR - [ ] After merging, confirm [open-telemetry/opentelemetry-java-instrumentation link-check](https://github.com/open-telemetry/opentelemetry-java-instrumentation/actions/workflows/reusable-link-check.yml) passes (and the repo can remove the workaround remap from its `lychee.toml`) --------- Signed-off-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com>
1 parent 5df593a commit b630cdf

File tree

3 files changed

+57
-20
lines changed

3 files changed

+57
-20
lines changed

README.md

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -212,20 +212,23 @@ to the base branch resolve against the PR branch instead. This
212212
ensures that links like `/blob/main/README.md` don't break when
213213
the file was added or moved in the PR.
214214

215-
For `/blob/` URLs, three ordered remap rules are applied
215+
For `/blob/` URLs, four ordered remap rules are applied
216216
(lychee uses first-match-wins):
217217

218-
1. **Line-number anchors** (`#L123`): GitHub renders these with
219-
JavaScript, so lychee can never verify the fragment. The anchor
218+
1. **Line-number anchors** (`#L123`, `#L10-L20`): GitHub renders
219+
these with JavaScript, so lychee can never verify the fragment.
220+
The anchor is stripped and the file is checked on the PR branch.
221+
2. **[Scroll to Text Fragment][stf] anchors** (`#:~:text=...`):
222+
Browser-only feature, not present in static HTML. The anchor
220223
is stripped and the file is checked on the PR branch.
221-
2. **Other fragment URLs** (`#section`): Remapped to
224+
3. **Other fragment URLs** (`#section`): Remapped to
222225
`raw.githubusercontent.com` where lychee can verify the fragment
223226
in the raw file content (workaround for
224227
[lychee#1729](https://github.com/lycheeverse/lychee/issues/1729)).
225-
3. **Non-fragment URLs**: Remapped from the base branch to the PR
228+
4. **Non-fragment URLs**: Remapped from the base branch to the PR
226229
branch (the original behavior).
227230

228-
For `/tree/` URLs, rules 1 and 3 apply (no raw remap needed).
231+
For `/tree/` URLs, rules 1 and 4 apply (no raw remap needed).
229232

230233
**Global GitHub URL handling:**
231234

@@ -237,6 +240,9 @@ two patterns that affect ALL GitHub URLs (any repository):
237240
JS-rendered line-number fragment is skipped. This means
238241
consuming repos don't need to exclude these in their
239242
`lychee.toml`.
243+
- **Scroll to Text Fragment anchors** (`#:~:text=...`): Stripped
244+
from any GitHub `/blob/` URL. These are a browser-only feature
245+
not present in static HTML.
240246
- **Issue comment anchors** (`#issuecomment-*`): The fragment
241247
is stripped so the issue/PR page is still checked, but the
242248
JS-rendered comment anchor is skipped.
@@ -254,11 +260,14 @@ via `--remap` arguments:
254260
[lychee#1729](https://github.com/lycheeverse/lychee/issues/1729)**
255261
— flint remaps fragment URLs to `raw.githubusercontent.com`
256262
for the current PR's head branch, and strips line-number
257-
anchors globally.
263+
and Scroll to Text Fragment anchors globally.
258264
- **`#issuecomment-*` excludes** — flint strips the fragment
259265
via remap so the issue/PR page is still checked.
260-
- **`#L\d+` line-number excludes** — flint strips the fragment
261-
via remap so the file is still checked.
266+
- **`#L\d+` / `#L\d+-L\d+` line-number excludes** — flint strips
267+
the fragment via remap so the file is still checked.
268+
- **`#:~:text=...` [Scroll to Text Fragment][stf] excludes**
269+
flint strips the fragment via remap so the file is still
270+
checked.
262271

263272
Note: flint uses `--remap` (not `--exclude`) for these because
264273
lychee's CLI `--exclude` flags override config file excludes
@@ -444,3 +453,5 @@ When conventional commits land on `main`, Release Please opens
444453
> **Note:** CI checks don't trigger automatically on release-please
445454
> PRs because they are created with `GITHUB_TOKEN`. To run CI,
446455
> either click **Update branch** or **close and reopen** the PR.
456+
457+
[stf]: https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments

tasks/lint/links.sh

Lines changed: 24 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,10 @@ eval "lychee_args=(${usage_lychee_args:-})"
2525
# https://github.com/lycheeverse/lychee/issues/1729).
2626
#
2727
# Lychee uses first-match-wins for remaps, so order matters:
28-
# 1. Line-number anchors → strip fragment, remap to head branch
29-
# 2. Other fragments → remap to raw.githubusercontent.com
30-
# 3. No fragment → remap to head branch (existing behavior)
28+
# 1. Line-number anchors → strip fragment, remap to head branch
29+
# 2. Scroll to Text Fragments → strip fragment, remap to head branch
30+
# 3. Other fragments → remap to raw.githubusercontent.com
31+
# 4. No fragment → remap to head branch (existing behavior)
3132
#
3233
# Set LYCHEE_SKIP_GITHUB_REMAPS=true to skip the GitHub-specific remaps
3334
# emitted by this function (escape hatch if they cause unexpected behavior;
@@ -72,26 +73,31 @@ build_remap_args() {
7273
local base_url="https://github.com/${repo}"
7374
local head_url="https://github.com/${head_repo}"
7475

75-
# /blob/ URLs — three rules, order matters (first-match-wins):
76+
# /blob/ URLs — four rules, order matters (first-match-wins):
7677

77-
# 1. Line-number anchors (#L123): strip fragment, remap to head branch
78+
# 1. Line-number anchors (#L123, #L10-L20): strip fragment, remap to head branch
7879
echo "--remap"
79-
echo "^${base_url}/blob/${base_ref}/(.*?)#L[0-9]+\$ ${head_url}/blob/${head_ref}/\$1"
80+
echo "^${base_url}/blob/${base_ref}/(.*?)#L[0-9]+.*\$ ${head_url}/blob/${head_ref}/\$1"
8081

81-
# 2. Other fragment URLs (#section): remap to raw.githubusercontent.com
82+
# 2. Scroll to Text Fragment anchors (#:~:text=...): browser-only,
83+
# strip fragment, remap to head branch
84+
echo "--remap"
85+
echo "^${base_url}/blob/${base_ref}/(.*?)#:~:text=.*\$ ${head_url}/blob/${head_ref}/\$1"
86+
87+
# 3. Other fragment URLs (#section): remap to raw.githubusercontent.com
8288
# so lychee can verify the fragment in raw content
8389
echo "--remap"
8490
echo "^${base_url}/blob/${base_ref}/(.*#.*)\$ https://raw.githubusercontent.com/${head_repo}/${head_ref}/\$1"
8591

86-
# 3. Non-fragment URLs: branch-remap only (existing behavior)
92+
# 4. Non-fragment URLs: branch-remap only (existing behavior)
8793
echo "--remap"
8894
echo "^${base_url}/blob/${base_ref}/(.*)\$ ${head_url}/blob/${head_ref}/\$1"
8995

9096
# /tree/ URLs — two rules:
9197

92-
# 1. Line-number anchors: strip fragment, remap to head branch
98+
# 1. Line-number anchors (#L123, #L10-L20): strip fragment, remap to head branch
9399
echo "--remap"
94-
echo "^${base_url}/tree/${base_ref}/(.*?)#L[0-9]+\$ ${head_url}/tree/${head_ref}/\$1"
100+
echo "^${base_url}/tree/${base_ref}/(.*?)#L[0-9]+.*\$ ${head_url}/tree/${head_ref}/\$1"
95101

96102
# 2. Non-fragment URLs: branch-remap only
97103
echo "--remap"
@@ -105,6 +111,9 @@ build_remap_args() {
105111
# - Line-number anchors (#L123, #L10-L20): rendered by JavaScript,
106112
# lychee cannot verify them. We strip the fragment so the file
107113
# itself is still checked.
114+
# - Scroll to Text Fragment anchors (#:~:text=...): browser-only,
115+
# lychee cannot verify them. We strip the fragment so the file
116+
# itself is still checked.
108117
# - Issue comment anchors (#issuecomment-*): rendered by JavaScript,
109118
# lychee cannot verify them. The fragment is stripped so the
110119
# issue/PR page itself is still checked.
@@ -122,6 +131,11 @@ build_global_github_args() {
122131
# shellcheck disable=SC2016 # single quotes are intentional: these are regex capture groups, not shell vars
123132
echo '^https://github.com/([^/]+/[^/]+)/blob/([^/]+)/(.*?)#L[0-9]+.*$ https://github.com/$1/blob/$2/$3'
124133

134+
# Strip Scroll to Text Fragment anchors from /blob/ URLs (browser-only, not in static HTML)
135+
echo "--remap"
136+
# shellcheck disable=SC2016 # single quotes are intentional: these are regex capture groups, not shell vars
137+
echo '^https://github.com/([^/]+/[^/]+)/blob/([^/]+)/(.*?)#:~:text=.*$ https://github.com/$1/blob/$2/$3'
138+
125139
# Strip issue comment anchors (JS-rendered, not in static HTML).
126140
# The issue page is still checked, just not the fragment.
127141
# We use --remap instead of --exclude because CLI --exclude

tests/test-links.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,22 @@ These links exercise the GitHub URL remap rules in `tasks/lint/links.sh`.
44
On PR branches, lychee rewrites `blob/main/` URLs to the PR branch —
55
these links verify that each remap rule works correctly during CI.
66

7-
## Line-number anchors (`#L123`) — fragment stripped, file checked on PR branch
7+
## Line-number anchors (`#L123`, `#L10-L20`) — fragment stripped, file checked on PR branch
88

99
- [README.md#L1](https://github.com/grafana/flint/blob/main/README.md#L1)
1010
- [links.sh#L6](https://github.com/grafana/flint/blob/main/tasks/lint/links.sh#L6)
11+
- [links.sh#L6-L10](https://github.com/grafana/flint/blob/main/tasks/lint/links.sh#L6-L10)
12+
13+
## Scroll to Text Fragment anchors (`#:~:text=...`) — fragment stripped, file checked on PR branch
14+
15+
<!-- editorconfig-checker-disable -->
16+
17+
- [links.sh text fragment](https://github.com/grafana/flint/blob/main/tasks/lint/links.sh#:~:text=build_remap_args)
18+
<!-- editorconfig-checker-enable -->
19+
20+
## External Scroll to Text Fragment anchors — fragment stripped globally
21+
22+
- [okhttp text fragment](https://github.com/square/okhttp/blob/master/README.md#:~:text=OkHttp)
1123

1224
## Section fragments (`#section`) — remapped to raw.githubusercontent.com
1325

0 commit comments

Comments
 (0)