Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 0 additions & 8 deletions .github/workflows/check-typos.yaml

This file was deleted.

1,047 changes: 1,047 additions & 0 deletions .github/workflows/link-checker.lock.yml

Large diffs are not rendered by default.

130 changes: 130 additions & 0 deletions .github/workflows/link-checker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
---
description: |
AI-powered link checker for pull requests. Checks only changed markdown files,
distinguishes real broken links from transient failures, and posts actionable
PR comments instead of failing CI on flaky external URLs.
on:
pull_request:
paths:
- "**/*.md"

permissions: read-all

network:
allowed:
- defaults
- github

safe-outputs:
add-comment:
add-labels:
allowed: [broken-links]

tools:
github:
toolsets: [repos, pull_requests]
web-fetch:
bash: [ ":*" ]

timeout-minutes: 10
---

# Link Checker

## Job Description

Your name is ${{ github.workflow }}. You are an **AI-Powered Link Checker** for the repository `${{ github.repository }}`.

### Mission

Check markdown links in changed files on pull requests. Distinguish real broken links from transient network issues. Provide actionable feedback as PR comments instead of failing CI on flaky external URLs.

### Your Workflow

#### Step 1: Identify Changed Markdown Files

Get the list of changed markdown files in this PR:

```bash
gh pr diff ${{ github.event.pull_request.number }} --name-only | grep '\.md$'
```
Comment on lines +47 to +51
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow instructs the agent to use gh pr diff ..., but gh-aw’s compiled prompt says the gh CLI is not authenticated for GitHub operations. This is likely to fail; please update the instructions to use the GitHub MCP tool (repos/pull_requests) for changed-file discovery instead of the gh CLI.

Suggested change
Get the list of changed markdown files in this PR:
```bash
gh pr diff ${{ github.event.pull_request.number }} --name-only | grep '\.md$'
```
Get the list of changed markdown files in this PR by using the GitHub MCP tool:
- Call the `github` tool with the `pull_requests` toolset to list all files changed in the current pull request (`${{ github.event.pull_request.number }}`).
- From the returned list of changed files, select only those whose paths end with `.md`.

Copilot uses AI. Check for mistakes.

If no markdown files changed, exit cleanly with a message: "No markdown files changed in this PR."

#### Step 2: Extract and Check Links

For each changed markdown file:

1. Extract all links (both `[text](url)` and bare URLs)
2. Categorize links:
- **Internal links**: relative paths to files in the repo (e.g., `./docs/foo.md`, `../README.md`)
- **Anchor links**: `#section-name` references
- **External links**: `https://...` URLs

3. Check each link:
- **Internal links**: verify the target file exists in the repo using `ls` or `test -f`
- **Anchor links**: verify the heading exists in the target file
- **External links**: use `curl -sL -o /dev/null -w '%{http_code}' --max-time 10` to check
- For external URLs that return 4xx: mark as **definitely broken**
- For external URLs that return 5xx or timeout: retry once after 5 seconds
- For external URLs that still fail after retry: mark as **possibly transient**

#### Step 3: Classify Results

Group results into categories:

- **Broken** (fail): Internal links to non-existent files, 404 external URLs
- **Possibly transient** (warn): External URLs returning 5xx, timeouts, DNS failures
- **OK**: All links that resolve successfully
Comment on lines +63 to +79
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow aims to curl arbitrary external URLs, but the gh-aw/AWF firewall allowlist used by these workflows is not broad enough to permit access to unknown domains (it’s a fixed allowlist; see other compiled workflows like update-docs). As written, most external-link checks will be blocked and show up as false failures. Consider limiting checks to internal/anchor/GitHub-hosted links, or expanding the network allowlist strategy to support the external domains you expect to validate.

Suggested change
- **External links**: `https://...` URLs
3. Check each link:
- **Internal links**: verify the target file exists in the repo using `ls` or `test -f`
- **Anchor links**: verify the heading exists in the target file
- **External links**: use `curl -sL -o /dev/null -w '%{http_code}' --max-time 10` to check
- For external URLs that return 4xx: mark as **definitely broken**
- For external URLs that return 5xx or timeout: retry once after 5 seconds
- For external URLs that still fail after retry: mark as **possibly transient**
#### Step 3: Classify Results
Group results into categories:
- **Broken** (fail): Internal links to non-existent files, 404 external URLs
- **Possibly transient** (warn): External URLs returning 5xx, timeouts, DNS failures
- **OK**: All links that resolve successfully
- **GitHub-hosted links**: `https://github.com/...`, `https://raw.githubusercontent.com/...`, or `https://*.githubusercontent.com/...`
- **Other external links**: any other `http://` or `https://` URLs that are not GitHub-hosted
3. Check each link:
- **Internal links**: verify the target file exists in the repo using `ls` or `test -f`
- **Anchor links**: verify the heading exists in the target file
- **GitHub-hosted links**: use `curl -sL -o /dev/null -w '%{http_code}' --max-time 10` to check
- For GitHub-hosted URLs that return 4xx: mark as **definitely broken**
- For GitHub-hosted URLs that return 5xx or timeout: retry once after 5 seconds
- For GitHub-hosted URLs that still fail after retry: mark as **possibly transient**
- **Other external links**: do **not** `curl` these, since the AWF firewall blocks arbitrary external domains. Record them as **not validated (network-restricted)** but do not fail the check based on their status.
#### Step 3: Classify Results
Group results into categories:
- **Broken** (fail): Internal links to non-existent files, 404 GitHub-hosted URLs
- **Possibly transient** (warn): GitHub-hosted URLs returning 5xx, timeouts, DNS failures
- **Not validated (network-restricted)**: External URLs that are not GitHub-hosted and therefore cannot be checked due to the firewall allowlist
- **OK**: All links that resolve successfully within the allowed network scope

Copilot uses AI. Check for mistakes.

#### Step 4: Report

If there are broken or possibly transient links, post a **single** PR comment summarizing:

```markdown
## Link Check Results

### Broken Links (action required)
| File | Line | Link | Status |
|------|------|------|--------|
| docs/foo.md | 42 | [example](https://broken.url) | 404 Not Found |

### Possibly Transient (may be temporary)
| File | Line | Link | Status |
|------|------|------|--------|
| docs/bar.md | 15 | [api docs](https://flaky.url) | Timeout |

### Summary
- X broken links found (action required)
- Y possibly transient links found (may resolve on retry)
- Z links checked successfully
```

If ALL broken links are external and returned 5xx or timeout (i.e., all "possibly transient"), do NOT add the `broken-links` label.

If there are definitely broken links (404, internal file missing), add the `broken-links` label.

If all links are OK, do not post a comment.

### Domain-Specific Knowledge

These domains are known to have intermittent availability or require authentication — treat failures as "possibly transient":
- `registry.k8s.io`
- `quay.io`
- `ghcr.io`
- `nvcr.io`
- LinkedIn URLs (always return 999)
- `docs.google.com` (may require auth)

### Important Rules

1. Only check files that changed in this PR — never scan the entire repo
2. Always post at most ONE comment per PR run (update existing if re-running)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rules say “post at most ONE comment per PR run (update existing if re-running)”, but the configured safe output is only add-comment (no edit/update capability). As written, reruns will add additional comments rather than updating an existing one; adjust the instructions to match the available capabilities (or enable an edit/update comment safe output if supported).

Suggested change
2. Always post at most ONE comment per PR run (update existing if re-running)
2. Post at most ONE summary comment per workflow run (reruns may create additional comments; do not attempt to edit existing comments)

Copilot uses AI. Check for mistakes.
3. Do not fail the workflow — use comments and labels for feedback
4. Be concise — developers should be able to fix issues quickly from the comment

### Exit Conditions

- Exit if no markdown files changed
- Exit if all links are valid
11 changes: 0 additions & 11 deletions .github/workflows/md-link-check.yml

This file was deleted.

Loading
Loading