Skip to content

Adding link checker script + GitHub CI #886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions .github/workflows/check-links.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Check links in AsciiDoc

on: [push, pull_request]

jobs:
check-links:
name: Check links in modified files
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Install Asciidoctor
run: |
sudo apt-get update
sudo apt-get install -y asciidoctor

- name: Make scripts executable
run: |
chmod +x scripts/check-links.sh
chmod +x scripts/check-modified.sh

- name: Fetch base branch
run: git fetch origin main

Comment on lines +25 to +27
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fetching origin/main breaks on PRs from forks – use base repo ref instead

For PRs raised from a fork, origin points to the fork, not the upstream repo, so git fetch origin main often fails or fetches an outdated branch.
Use the base-ref provided by GitHub Actions or add an explicit upstream remote.

-      - name: Fetch base branch
-        run: git fetch origin main
+      - name: Fetch base branch
+        if: ${{ github.event_name == 'pull_request' }}
+        run: |
+          git fetch \
+            "https://github.com/${{ github.repository_owner }}/opendatahub-documentation.git" \
+            "${{ github.event.pull_request.base.ref }}:refs/remotes/upstream/${{ github.event.pull_request.base.ref }}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Fetch base branch
run: git fetch origin main
- name: Fetch base branch
if: ${{ github.event_name == 'pull_request' }}
run: |
git fetch \
"https://github.com/${{ github.repository_owner }}/opendatahub-documentation.git" \
"${{ github.event.pull_request.base.ref }}:refs/remotes/upstream/${{ github.event.pull_request.base.ref }}"
🤖 Prompt for AI Agents
In .github/workflows/check-links.yml around lines 25 to 27, the command 'git
fetch origin main' fails on PRs from forks because 'origin' points to the fork
repository. To fix this, replace 'origin' with the upstream repository reference
or use the GitHub Actions base-ref variable to fetch the correct base branch
from the upstream repo. Alternatively, add an explicit remote for the upstream
repo and fetch from it instead of 'origin'.

- name: Check links in modified files
shell: bash
run: |
scripts/check-modified.sh


77 changes: 77 additions & 0 deletions scripts/check-links.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#!/bin/bash
#
# Checks for 404 links using Asciidoctor and curl

usage() {
echo "Usage: $0 [<adoc_file>]"
exit 1
}

# Parse arguments
INPUT_FILE=""

# Check dependencies
if ! asciidoctor -v >/dev/null 2>&1; then
echo "Error: Asciidoctor is not installed" >&2
exit 1
fi

INPUT_FILE="$1"

if [ $# -eq 0 ]; then
usage
fi

# Create temp file for flagging broken links
TMP_FILE=$(mktemp)
echo "0" > "$TMP_FILE"

# Load ignore patterns from external file
IGNORE_FILE="$(dirname "$0")/links.ignore"

if [ ! -f "$IGNORE_FILE" ]; then
echo "Error: Missing ignore patterns file: $IGNORE_FILE" >&2
exit 1
fi

mapfile -t IGNORE_PATTERNS < "$IGNORE_FILE"
PATTERNS_DECL=$(declare -p IGNORE_PATTERNS)

check_url() {
local URL=$1
eval "$PATTERNS_DECL"

URL=${URL%[.,;:?!\]\)]}

for PATTERN in "${IGNORE_PATTERNS[@]}"; do
if [[ "$URL" =~ $PATTERN ]]; then
exit 0
fi
done

STATUS=$(curl -Ls -o /dev/null -w "%{http_code}" --max-time 5 --connect-timeout 2 "$URL")

if [[ "$STATUS" != "000" && "$STATUS" != "403" && ! "$STATUS" =~ ^(2|3)[0-9]{2}$ ]]; then
echo -e "Invalid URL (HTTP status $STATUS): \n\033[31m$URL\033[0m"
echo "1" > "$TMP_FILE"
fi
}

export TMP_FILE
export -f check_url

run_url_checks() {
local FILE="$1"
echo -e "\033[32mChecking: $FILE\033[0m"
asciidoctor "$FILE" -a doctype=book -o - | \
grep -Eo '(http|https)://[a-zA-Z0-9./?=%_-]*' | \
sort -u | \
xargs -P 10 -n 1 bash -c "$PATTERNS_DECL; check_url \"\$0\""
}

run_url_checks "$INPUT_FILE"

if [ "$(cat "$TMP_FILE")" -eq 1 ]; then
echo "Errors found"
exit 1
fi
59 changes: 59 additions & 0 deletions scripts/check-modified.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/bin/bash
#
# Checks for 404 links in a compiled list of modified books

ERRORS=0

FILES=$(git diff --name-only origin/main...HEAD --diff-filter=d -- "*.adoc")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Diff against wrong remote on forked PRs

git diff origin/main...HEAD has the same fork issue as the workflow.
Derive the base branch dynamically:

-FILES=$(git diff --name-only origin/main...HEAD --diff-filter=d -- "*.adoc")
+BASE=${GITHUB_BASE_REF:-main}
+FILES=$(git diff --name-only "origin/${BASE}"...HEAD --diff-filter=d -- "*.adoc")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FILES=$(git diff --name-only origin/main...HEAD --diff-filter=d -- "*.adoc")
BASE=${GITHUB_BASE_REF:-main}
FILES=$(git diff --name-only "origin/${BASE}"...HEAD --diff-filter=d -- "*.adoc")
🤖 Prompt for AI Agents
In scripts/check-modified.sh at line 5, the git diff command uses a hardcoded
remote branch origin/main which causes issues on forked PRs. Modify the script
to dynamically determine the base branch or remote reference instead of using
origin/main directly. This can be done by deriving the base branch from the
current git context or environment variables to ensure the diff compares against
the correct upstream branch in forked PR scenarios.


MODULES=$(echo "$FILES" | grep '^modules/.*\.adoc$')
ASSEMBLIES=$(echo "$FILES" | grep '^assemblies/.*\.adoc$')
BOOKS=$(echo "$FILES" | grep -E '^[^/]+\.adoc$')

UPDATED_BOOKS=()

if [ -n "$MODULES" ]; then
# Check for assemblies and books that include modified modules
while IFS= read -r module; do
mapfile -t updated_books < <(grep -rnwl . --include="*.adoc" --exclude-dir={_artifacts,modules,assemblies} -e "$(basename "$module")")
UPDATED_BOOKS+=( "${updated_books[@]}" )

mapfile -t updated_books < <(grep -rnwl assemblies --include="*.adoc" --exclude-dir={_artifacts,modules} -e "$(basename "$module")")
UPDATED_BOOKS+=( "${updated_books[@]}" )
done <<< "$MODULES"
fi

# Check for books that include modified assemblies
if [ -n "$ASSEMBLIES" ]; then
while IFS= read -r assembly; do
mapfile -t results3 < <(grep -rnwl . --include="*.adoc" --exclude-dir={_artifacts,modules,assemblies} -e "$(basename "$assembly")")
UPDATED_BOOKS+=( "${results3[@]}" )
done <<< "$ASSEMBLIES"
fi

# Check for directly updated books
if [ -n "$BOOKS" ]; then
while IFS= read -r book; do
UPDATED_BOOKS+=( "$book" )
done <<< "$BOOKS"
fi

if [ ${#UPDATED_BOOKS[@]} -eq 0 ]; then
echo "No modified books. Skipping link check."
exit 0
fi

# Check links in the compiled list of books

for f in "${UPDATED_BOOKS[@]}"; do
echo "Checking: $f"
if ! ./scripts/check-links.sh "$f"; then
echo "❌ Link check failed for: $f"
ERRORS=1
fi
done

if [ "$ERRORS" -ne 0 ]; then
echo "One or more link checks failed."
exit 1
fi
15 changes: 15 additions & 0 deletions scripts/links.ignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Add ignore link regexes one per line
.*docs\.google\.com.*
.*google\.com.*
.*issues\.redhat\.com.*
.*0\.0\.0\.0.*
.*localhost.*
.*registry\.redhat\.io.*
.*example\.org.*
.*github.com/example/myrepo\.git
.*fonts\.googleapis\.com.*
.*mixtral-my-project.apps\.my-cluster\.com.*
.*openshiftapps\.com.*
.*minio-cluster\.local.*
.*codeflare-operator-webhook-service\.redhat-ods-applications\.svc
.*example.com.*