-
Notifications
You must be signed in to change notification settings - Fork 1
Add GitHub Actions workflows for automated broken link detection #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
5b24a6a
1bb6149
a5248c0
5bf030a
7a2d412
5fcf12e
0f5eba7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,208 @@ | ||||||||||||||
| name: Broken Links Check - Nightly | ||||||||||||||
|
|
||||||||||||||
| on: | ||||||||||||||
| schedule: | ||||||||||||||
| # Run every day at 2:00 AM UTC | ||||||||||||||
| - cron: '0 2 * * *' | ||||||||||||||
| workflow_dispatch: # Allow manual trigger | ||||||||||||||
|
|
||||||||||||||
| permissions: | ||||||||||||||
| contents: read | ||||||||||||||
|
|
||||||||||||||
| jobs: | ||||||||||||||
| check-broken-links: | ||||||||||||||
| runs-on: ubuntu-latest | ||||||||||||||
|
|
||||||||||||||
| steps: | ||||||||||||||
| - name: Checkout code | ||||||||||||||
| uses: actions/checkout@v4 | ||||||||||||||
|
|
||||||||||||||
| - name: Setup Node.js | ||||||||||||||
| uses: actions/setup-node@v4 | ||||||||||||||
| with: | ||||||||||||||
| node-version: '18' | ||||||||||||||
| cache: 'npm' | ||||||||||||||
|
|
||||||||||||||
| - name: Install dependencies | ||||||||||||||
| run: | | ||||||||||||||
| PUPPETEER_SKIP_DOWNLOAD=true npm install | ||||||||||||||
| env: | ||||||||||||||
| NODE_ENV: production | ||||||||||||||
|
|
||||||||||||||
| - name: Run broken links check | ||||||||||||||
| id: broken_links | ||||||||||||||
| run: | | ||||||||||||||
| echo "Running broken links check..." | ||||||||||||||
| OUTPUT=$(./node_modules/.bin/mint broken-links 2>&1 || true) | ||||||||||||||
|
||||||||||||||
| echo "$OUTPUT" | ||||||||||||||
|
|
||||||||||||||
| # Save output to file | ||||||||||||||
| echo "$OUTPUT" > broken-links-output.txt | ||||||||||||||
|
|
||||||||||||||
| # Extract the summary line | ||||||||||||||
| SUMMARY=$(echo "$OUTPUT" | grep -E "found [0-9]+ broken links" || echo "No broken links found") | ||||||||||||||
| echo "summary=$SUMMARY" >> $GITHUB_OUTPUT | ||||||||||||||
|
|
||||||||||||||
| # Check if there are any broken links | ||||||||||||||
| if echo "$OUTPUT" | grep -q "found [1-9][0-9]* broken links"; then | ||||||||||||||
| echo "has_broken_links=true" >> $GITHUB_OUTPUT | ||||||||||||||
|
Comment on lines
50
to
56
|
||||||||||||||
| # Count total broken links | ||||||||||||||
| TOTAL_LINKS=$(echo "$SUMMARY" | grep -oE "[0-9]+" | head -1) | ||||||||||||||
| echo "total_links=$TOTAL_LINKS" >> $GITHUB_OUTPUT | ||||||||||||||
| # Count files with broken links | ||||||||||||||
| TOTAL_FILES=$(echo "$SUMMARY" | grep -oE "[0-9]+" | tail -1) | ||||||||||||||
| echo "total_files=$TOTAL_FILES" >> $GITHUB_OUTPUT | ||||||||||||||
| else | ||||||||||||||
| echo "has_broken_links=false" >> $GITHUB_OUTPUT | ||||||||||||||
| echo "total_links=0" >> $GITHUB_OUTPUT | ||||||||||||||
| echo "total_files=0" >> $GITHUB_OUTPUT | ||||||||||||||
| fi | ||||||||||||||
|
|
||||||||||||||
| - name: Prepare Slack message | ||||||||||||||
| id: slack_message | ||||||||||||||
| run: | | ||||||||||||||
| OUTPUT_FILE="broken-links-output.txt" | ||||||||||||||
| HAS_BROKEN_LINKS="${{ steps.broken_links.outputs.has_broken_links }}" | ||||||||||||||
| SUMMARY="${{ steps.broken_links.outputs.summary }}" | ||||||||||||||
| TOTAL_LINKS="${{ steps.broken_links.outputs.total_links }}" | ||||||||||||||
| TOTAL_FILES="${{ steps.broken_links.outputs.total_files }}" | ||||||||||||||
| REPO_URL="https://github.com/${{ github.repository }}" | ||||||||||||||
|
|
||||||||||||||
| if [ "$HAS_BROKEN_LINKS" = "true" ]; then | ||||||||||||||
| # Create a truncated version of the output for Slack (first 100 lines) | ||||||||||||||
| TRUNCATED_OUTPUT=$(head -100 "$OUTPUT_FILE") | ||||||||||||||
|
Comment on lines
+80
to
+81
|
||||||||||||||
| # Create a truncated version of the output for Slack (first 100 lines) | |
| TRUNCATED_OUTPUT=$(head -100 "$OUTPUT_FILE") | |
| # Create a truncated version of the output for Slack (first 100 lines) and add a truncation note | |
| TRUNCATED_OUTPUT="$(head -100 "$OUTPUT_FILE") | |
| [Output truncated to first 100 lines. See full report in workflow artifacts.]" |
Outdated
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sed-based escaping approach on line 76 is complex and error-prone. While jq is correctly used afterward for proper JSON construction, this intermediate sed step (line 76) is unnecessary since jq can handle the escaping directly via the --arg flag (as done on line 85). The sed operations here are redundant and could potentially introduce escaping issues.
| # Escape the output for JSON (escape backslashes, quotes, and newlines) | |
| ESCAPED_OUTPUT=$(echo "$TRUNCATED_OUTPUT" | sed 's/\\/\\\\/g' | sed 's/"/\\"/g' | sed ':a;N;$!ba;s/\n/\\n/g') | |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using exit 0 when the webhook URL is not set means the workflow will show as successful even when notifications couldn't be sent. While the echo messages inform about the skip, consider whether this should be a warning or if the workflow should continue to upload artifacts even without Slack notification. The current behavior is acceptable but could be clearer with a warning annotation using echo "::warning::SLACK_WEBHOOK_URL not configured".
| if [ -z "$SLACK_WEBHOOK_URL" ]; then | |
| if [ -z "$SLACK_WEBHOOK_URL" ]; then | |
| echo "::warning::SLACK_WEBHOOK_URL not configured" |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,187 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||
| name: Broken Links Check - PR | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| on: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| pull_request: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| types: [opened, synchronize, reopened] | ||||||||||||||||||||||||||||||||||||||||||||||||||
| paths: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| - '**.mdx' | ||||||||||||||||||||||||||||||||||||||||||||||||||
| - '**.md' | ||||||||||||||||||||||||||||||||||||||||||||||||||
| - 'docs.json' | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| permissions: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| contents: read | ||||||||||||||||||||||||||||||||||||||||||||||||||
| pull-requests: write | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| jobs: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| check-broken-links: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| runs-on: ubuntu-latest | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| steps: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| - name: Checkout code | ||||||||||||||||||||||||||||||||||||||||||||||||||
| uses: actions/checkout@v4 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| with: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| fetch-depth: 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| - name: Setup Node.js | ||||||||||||||||||||||||||||||||||||||||||||||||||
| uses: actions/setup-node@v4 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| with: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| node-version: '18' | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| node-version: '18' | |
| node-version: '22' |
Outdated
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow always runs the broken links check with || true to prevent failures, but this means the step will never fail even if the mint command encounters an error (e.g., invalid configuration, missing dependencies). Consider checking the exit code separately and failing the workflow if the error is not related to finding broken links, to ensure genuine errors are not silently ignored.
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern checking for broken links doesn't handle the "found 0 broken links" case correctly. The pattern found [1-9][0-9]* broken links will match numbers starting with 1-9, but if the command reports "found 0 broken links", this won't be matched and will fall through to the "No broken links found" case, which is correct. However, this same pattern is also used to validate the presence of a summary on line 48, which means that finding "found 0 broken links" would result in "No broken links found" as the summary instead of the actual "found 0 broken links" message. Consider using separate patterns for detecting when links exist versus extracting the summary.
Outdated
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The git diff command uses origin/${{ github.base_ref }} which assumes the remote is named "origin". In GitHub Actions, the default remote name is indeed "origin", but the base ref is already checked out. A more robust approach would be to use ${{ github.event.pull_request.base.sha }} or just ${{ github.base_ref }} without the "origin/" prefix since actions/checkout@v4 with fetch-depth: 0 already fetches all the necessary refs.
| git diff --name-only origin/${{ github.base_ref }}...HEAD | grep -E '\.(mdx?|json)$' > changed-files.txt || echo "" > changed-files.txt | |
| git diff --name-only ${{ github.event.pull_request.base.sha }}...HEAD | grep -E '\.(mdx?|json)$' > changed-files.txt || echo "" > changed-files.txt |
Outdated
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script uses mapfile -t CHANGED < "$CHANGED_FILES" to read changed files into an array, but if the changed-files.txt is empty (no changed files), this will create an array with one empty string element. The subsequent comparison in the loop (line 91) will then try to match files against an empty string, which could cause issues. Consider checking if the file is empty first or handling the empty array case explicitly.
Outdated
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern \.(mdx?|json)$ will match .mdx, .md, and .json files, but the filter-links.sh script on lines 86-87 only checks for .mdx and .md files. This means that if docs.json is changed and contains broken links, those links would be detected by the broken links checker but filtered out from the "Broken Links in Changed Files" section. Either the git diff pattern should exclude .json files, or the filter script should include .json files in its logic.
| if [[ $line =~ ^[a-zA-Z0-9] ]] && ([[ $line == *.mdx ]] || [[ $line == *.md ]]); then | |
| if [[ $line =~ ^[a-zA-Z0-9] ]] && ([[ $line == *.mdx ]] || [[ $line == *.md ]] || [[ $line == *.json ]]); then |
Outdated
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filtering logic assumes that file paths in the broken-links output don't start with spaces or special characters (regex: ^[a-zA-Z0-9]). However, this assumption may not hold for all valid file paths. For example, files in subdirectories might be reported with relative paths that could start with "./", or the output format could include indentation. Consider making this pattern more robust or verifying the actual output format of mint broken-links to ensure correct parsing.
| # Check if line is a file path (doesn't start with space or special char) | |
| if [[ $line =~ ^[a-zA-Z0-9] ]] && ([[ $line == *.mdx ]] || [[ $line == *.md ]]); then | |
| CURRENT_FILE="$line" | |
| # Check if line is a file path (optionally indented, ending with .md or .mdx) | |
| if [[ $line =~ ^[[:space:]]*([^[:space:]]+\.mdx?)$ ]]; then | |
| CURRENT_FILE="${BASH_REMATCH[1]}" |
Outdated
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script uses echo -e to interpret escape sequences (like \n) when outputting the filtered results. However, the construction FILTERED_OUTPUT="${FILTERED_OUTPUT}${line}\n" adds a literal \n string, not a newline character. When later echoed with echo -e, this will work but is inconsistent with the actual newline handling in the while loop. Consider using printf instead of echo -e for more reliable output formatting, or append actual newlines to the variable.
| FILTERED_OUTPUT="${FILTERED_OUTPUT}${CURRENT_FILE}\n" | |
| fi | |
| elif $CAPTURE && [[ $line =~ ^[[:space:]]*⎿ ]]; then | |
| # This is a broken link for a changed file | |
| FILTERED_OUTPUT="${FILTERED_OUTPUT}${line}\n" | |
| fi | |
| done < "$OUTPUT_FILE" | |
| if [ -z "$FILTERED_OUTPUT" ]; then | |
| echo "No broken links found in changed files." | |
| else | |
| echo -e "$FILTERED_OUTPUT" | |
| FILTERED_OUTPUT+="${CURRENT_FILE}"$'\n' | |
| fi | |
| elif $CAPTURE && [[ $line =~ ^[[:space:]]*⎿ ]]; then | |
| # This is a broken link for a changed file | |
| FILTERED_OUTPUT+="${line}"$'\n' | |
| fi | |
| done < "$OUTPUT_FILE" | |
| if [ -z "$FILTERED_OUTPUT" ]; then | |
| echo "No broken links found in changed files." | |
| else | |
| printf '%s' "$FILTERED_OUTPUT" |
Outdated
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The multiline output delimiter uses 'EOF' as the marker, but if the actual filtered output contains the literal string 'EOF' on its own line, this will prematurely terminate the heredoc and cause parsing errors. Consider using a more unique delimiter like 'FILTERED_LINKS_EOF' or 'END_OF_FILTERED_OUTPUT' to avoid potential conflicts with actual output content.
| echo 'filtered<<EOF' | |
| echo "$FILTERED" | |
| echo 'EOF' | |
| echo 'filtered<<FILTERED_LINKS_EOF' | |
| echo "$FILTERED" | |
| echo 'FILTERED_LINKS_EOF' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow uses Node.js version 18, but this version will reach End-of-Life on April 30, 2025, which has already passed as of January 2026. Consider updating to Node.js 20 or 22 (LTS versions) to ensure continued support and security updates. This applies to both the PR and nightly workflows.