-
Notifications
You must be signed in to change notification settings - Fork 1
Add GitHub Actions workflows for automated broken link detection #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: ayush-shah <[email protected]>
…ON escaping Co-authored-by: ayush-shah <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements automated broken link detection using the mint broken-links command through two GitHub Actions workflows: one for PR validation and one for nightly monitoring.
- Adds PR workflow that checks for broken links in changed files and posts/updates a comment with results
- Adds nightly workflow that runs a full repository scan and sends Slack notifications
- Implements proper JSON escaping for Slack payloads and handles missing webhook URL gracefully
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
.github/workflows/broken-links-pr.yml |
PR validation workflow that triggers on documentation file changes, filters broken links by changed files, and posts sticky comments with results |
.github/workflows/broken-links-nightly.yml |
Scheduled nightly workflow that scans the entire repository, sends Slack notifications with results, and uploads artifacts |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Extract the summary line (e.g., "found 91 broken links in 47 files") | ||
| SUMMARY=$(echo "$OUTPUT" | grep -E "found [0-9]+ broken links" || echo "No broken links found") | ||
| echo "summary=$SUMMARY" >> $GITHUB_OUTPUT | ||
|
|
||
| # Check if there are any broken links | ||
| if echo "$OUTPUT" | grep -q "found [1-9][0-9]* broken links"; then | ||
| echo "has_broken_links=true" >> $GITHUB_OUTPUT | ||
| else | ||
| echo "has_broken_links=false" >> $GITHUB_OUTPUT | ||
| fi |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern checking for broken links doesn't handle the "found 0 broken links" case correctly. The pattern found [1-9][0-9]* broken links will match numbers starting with 1-9, but if the command reports "found 0 broken links", this won't be matched and will fall through to the "No broken links found" case, which is correct. However, this same pattern is also used to validate the presence of a summary on line 48, which means that finding "found 0 broken links" would result in "No broken links found" as the summary instead of the actual "found 0 broken links" message. Consider using separate patterns for detecting when links exist versus extracting the summary.
|
|
||
| while IFS= read -r line; do | ||
| # Check if line is a file path (doesn't start with space or special char) | ||
| if [[ $line =~ ^[a-zA-Z0-9] ]] && ([[ $line == *.mdx ]] || [[ $line == *.md ]]); then |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern \.(mdx?|json)$ will match .mdx, .md, and .json files, but the filter-links.sh script on lines 86-87 only checks for .mdx and .md files. This means that if docs.json is changed and contains broken links, those links would be detected by the broken links checker but filtered out from the "Broken Links in Changed Files" section. Either the git diff pattern should exclude .json files, or the filter script should include .json files in its logic.
| if [[ $line =~ ^[a-zA-Z0-9] ]] && ([[ $line == *.mdx ]] || [[ $line == *.md ]]); then | |
| if [[ $line =~ ^[a-zA-Z0-9] ]] && ([[ $line == *.mdx ]] || [[ $line == *.md ]] || [[ $line == *.json ]]); then |
| # Check if line is a file path (doesn't start with space or special char) | ||
| if [[ $line =~ ^[a-zA-Z0-9] ]] && ([[ $line == *.mdx ]] || [[ $line == *.md ]]); then | ||
| CURRENT_FILE="$line" |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filtering logic assumes that file paths in the broken-links output don't start with spaces or special characters (regex: ^[a-zA-Z0-9]). However, this assumption may not hold for all valid file paths. For example, files in subdirectories might be reported with relative paths that could start with "./", or the output format could include indentation. Consider making this pattern more robust or verifying the actual output format of mint broken-links to ensure correct parsing.
| # Check if line is a file path (doesn't start with space or special char) | |
| if [[ $line =~ ^[a-zA-Z0-9] ]] && ([[ $line == *.mdx ]] || [[ $line == *.md ]]); then | |
| CURRENT_FILE="$line" | |
| # Check if line is a file path (optionally indented, ending with .md or .mdx) | |
| if [[ $line =~ ^[[:space:]]*([^[:space:]]+\.mdx?)$ ]]; then | |
| CURRENT_FILE="${BASH_REMATCH[1]}" |
| env: | ||
| SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }} | ||
| run: | | ||
| if [ -z "$SLACK_WEBHOOK_URL" ]; then |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using exit 0 when the webhook URL is not set means the workflow will show as successful even when notifications couldn't be sent. While the echo messages inform about the skip, consider whether this should be a warning or if the workflow should continue to upload artifacts even without Slack notification. The current behavior is acceptable but could be clearer with a warning annotation using echo "::warning::SLACK_WEBHOOK_URL not configured".
| if [ -z "$SLACK_WEBHOOK_URL" ]; then | |
| if [ -z "$SLACK_WEBHOOK_URL" ]; then | |
| echo "::warning::SLACK_WEBHOOK_URL not configured" |
| # Create a truncated version of the output for Slack (first 100 lines) | ||
| TRUNCATED_OUTPUT=$(head -100 "$OUTPUT_FILE") |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The truncation uses head -100 to limit output to 100 lines, but this may cut off a broken link entry in the middle, resulting in incomplete information in the Slack message. Consider adding logic to ensure truncation happens at a complete entry boundary, or add a note in the Slack message indicating that the output has been truncated (e.g., "Showing first 100 lines, see full report in artifacts").
| # Create a truncated version of the output for Slack (first 100 lines) | |
| TRUNCATED_OUTPUT=$(head -100 "$OUTPUT_FILE") | |
| # Create a truncated version of the output for Slack (first 100 lines) and add a truncation note | |
| TRUNCATED_OUTPUT="$(head -100 "$OUTPUT_FILE") | |
| [Output truncated to first 100 lines. See full report in workflow artifacts.]" |
| # Escape the output for JSON (escape backslashes, quotes, and newlines) | ||
| ESCAPED_OUTPUT=$(echo "$TRUNCATED_OUTPUT" | sed 's/\\/\\\\/g' | sed 's/"/\\"/g' | sed ':a;N;$!ba;s/\n/\\n/g') | ||
|
|
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sed-based escaping approach on line 76 is complex and error-prone. While jq is correctly used afterward for proper JSON construction, this intermediate sed step (line 76) is unnecessary since jq can handle the escaping directly via the --arg flag (as done on line 85). The sed operations here are redundant and could potentially introduce escaping issues.
| # Escape the output for JSON (escape backslashes, quotes, and newlines) | |
| ESCAPED_OUTPUT=$(echo "$TRUNCATED_OUTPUT" | sed 's/\\/\\\\/g' | sed 's/"/\\"/g' | sed ':a;N;$!ba;s/\n/\\n/g') | |
| # Extract the summary line | ||
| SUMMARY=$(echo "$OUTPUT" | grep -E "found [0-9]+ broken links" || echo "No broken links found") | ||
| echo "summary=$SUMMARY" >> $GITHUB_OUTPUT | ||
|
|
||
| # Check if there are any broken links | ||
| if echo "$OUTPUT" | grep -q "found [1-9][0-9]* broken links"; then | ||
| echo "has_broken_links=true" >> $GITHUB_OUTPUT |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern checking for broken links doesn't handle the "found 0 broken links" case correctly. The pattern found [1-9][0-9]* broken links will match numbers starting with 1-9, but if the command reports "found 0 broken links", this won't be matched and will fall through to the "No broken links found" case, which is correct. However, this same pattern is also used to validate the presence of a summary on line 43, which means that finding "found 0 broken links" would result in "No broken links found" as the summary instead of the actual "found 0 broken links" message. Consider using separate patterns for detecting when links exist versus extracting the summary.
| id: changed_files | ||
| run: | | ||
| # Get list of changed .mdx and .md files in this PR | ||
| git diff --name-only origin/${{ github.base_ref }}...HEAD | grep -E '\.(mdx?|json)$' > changed-files.txt || echo "" > changed-files.txt |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The git diff command uses origin/${{ github.base_ref }} which assumes the remote is named "origin". In GitHub Actions, the default remote name is indeed "origin", but the base ref is already checked out. A more robust approach would be to use ${{ github.event.pull_request.base.sha }} or just ${{ github.base_ref }} without the "origin/" prefix since actions/checkout@v4 with fetch-depth: 0 already fetches all the necessary refs.
| git diff --name-only origin/${{ github.base_ref }}...HEAD | grep -E '\.(mdx?|json)$' > changed-files.txt || echo "" > changed-files.txt | |
| git diff --name-only ${{ github.event.pull_request.base.sha }}...HEAD | grep -E '\.(mdx?|json)$' > changed-files.txt || echo "" > changed-files.txt |
| FILTERED_OUTPUT="${FILTERED_OUTPUT}${CURRENT_FILE}\n" | ||
| fi | ||
| elif $CAPTURE && [[ $line =~ ^[[:space:]]*⎿ ]]; then | ||
| # This is a broken link for a changed file | ||
| FILTERED_OUTPUT="${FILTERED_OUTPUT}${line}\n" | ||
| fi | ||
| done < "$OUTPUT_FILE" | ||
|
|
||
| if [ -z "$FILTERED_OUTPUT" ]; then | ||
| echo "No broken links found in changed files." | ||
| else | ||
| echo -e "$FILTERED_OUTPUT" |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script uses echo -e to interpret escape sequences (like \n) when outputting the filtered results. However, the construction FILTERED_OUTPUT="${FILTERED_OUTPUT}${line}\n" adds a literal \n string, not a newline character. When later echoed with echo -e, this will work but is inconsistent with the actual newline handling in the while loop. Consider using printf instead of echo -e for more reliable output formatting, or append actual newlines to the variable.
| FILTERED_OUTPUT="${FILTERED_OUTPUT}${CURRENT_FILE}\n" | |
| fi | |
| elif $CAPTURE && [[ $line =~ ^[[:space:]]*⎿ ]]; then | |
| # This is a broken link for a changed file | |
| FILTERED_OUTPUT="${FILTERED_OUTPUT}${line}\n" | |
| fi | |
| done < "$OUTPUT_FILE" | |
| if [ -z "$FILTERED_OUTPUT" ]; then | |
| echo "No broken links found in changed files." | |
| else | |
| echo -e "$FILTERED_OUTPUT" | |
| FILTERED_OUTPUT+="${CURRENT_FILE}"$'\n' | |
| fi | |
| elif $CAPTURE && [[ $line =~ ^[[:space:]]*⎿ ]]; then | |
| # This is a broken link for a changed file | |
| FILTERED_OUTPUT+="${line}"$'\n' | |
| fi | |
| done < "$OUTPUT_FILE" | |
| if [ -z "$FILTERED_OUTPUT" ]; then | |
| echo "No broken links found in changed files." | |
| else | |
| printf '%s' "$FILTERED_OUTPUT" |
|
@copilot apply changes based on the comments in this thread |
…rt, use warning annotation, add truncation note, remove redundant sed escaping, use event SHA, and fix string concatenation Co-authored-by: ayush-shah <[email protected]>
Applied all changes from the review comments in commit 7a2d412: PR Workflow fixes:
Nightly Workflow fixes:
All YAML files validated and tested successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - name: Setup Node.js | ||
| uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: '18' |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow uses Node.js version 18, but this version will reach End-of-Life on April 30, 2025, which has already passed as of January 2026. Consider updating to Node.js 20 or 22 (LTS versions) to ensure continued support and security updates. This applies to both the PR and nightly workflows.
| node-version: '18' | |
| node-version: '20' |
| echo 'filtered<<EOF' | ||
| echo "$FILTERED" | ||
| echo 'EOF' |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The multiline output delimiter uses 'EOF' as the marker, but if the actual filtered output contains the literal string 'EOF' on its own line, this will prematurely terminate the heredoc and cause parsing errors. Consider using a more unique delimiter like 'FILTERED_LINKS_EOF' or 'END_OF_FILTERED_OUTPUT' to avoid potential conflicts with actual output content.
| echo 'filtered<<EOF' | |
| echo "$FILTERED" | |
| echo 'EOF' | |
| echo 'filtered<<FILTERED_LINKS_EOF' | |
| echo "$FILTERED" | |
| echo 'FILTERED_LINKS_EOF' |
| while IFS= read -r line; do | ||
| # Check if line is a file path (file paths end with .mdx, .md, or .json) | ||
| if [[ $line =~ ^[a-zA-Z0-9] ]] && ([[ $line == *.mdx ]] || [[ $line == *.md ]] || [[ $line == *.json ]]); then | ||
| CURRENT_FILE="$line" | ||
| # Check if this file is in changed files | ||
| CAPTURE=false | ||
| for changed in "${CHANGED[@]}"; do | ||
| if [[ "$CURRENT_FILE" == "$changed" ]]; then | ||
| CAPTURE=true | ||
| break | ||
| fi | ||
| done |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filter-links.sh script reads the output file line by line and performs string comparisons in nested loops (lines 90-95), resulting in O(n*m) complexity where n is the number of lines and m is the number of changed files. For repositories with many broken links and many changed files, this could be slow. Consider optimizing by using associative arrays or grep with pattern matching to improve performance.
| id: broken_links | ||
| run: | | ||
| echo "Running broken links check..." | ||
| OUTPUT=$(./node_modules/.bin/mint broken-links 2>&1 || true) |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow always runs the broken links check with || true to prevent failures, but this means the step will never fail even if the mint command encounters an error (e.g., invalid configuration, missing dependencies). Consider checking the exit code separately and failing the workflow if the error is not related to finding broken links, to ensure genuine errors are not silently ignored.
| - name: Setup Node.js | ||
| uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: '18' |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow uses Node.js version 18, but this version will reach End-of-Life on April 30, 2025, which has already passed as of January 2026. Consider updating to Node.js 20 or 22 (LTS versions) to ensure continued support and security updates. This applies to both the PR and nightly workflows.
| node-version: '18' | |
| node-version: '22' |
| id: broken_links | ||
| run: | | ||
| echo "Running broken links check..." | ||
| OUTPUT=$(./node_modules/.bin/mint broken-links 2>&1 || true) |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow always runs the broken links check with || true to prevent failures, but this means the step will never fail even if the mint command encounters an error (e.g., invalid configuration, missing dependencies). Consider checking the exit code separately and failing the workflow if the error is not related to finding broken links, to ensure genuine errors are not silently ignored.
| CAPTURE=false | ||
|
|
||
| while IFS= read -r line; do | ||
| # Check if line is a file path (file paths end with .mdx, .md, or .json) | ||
| if [[ $line =~ ^[a-zA-Z0-9] ]] && ([[ $line == *.mdx ]] || [[ $line == *.md ]] || [[ $line == *.json ]]); then | ||
| CURRENT_FILE="$line" | ||
| # Check if this file is in changed files | ||
| CAPTURE=false | ||
| for changed in "${CHANGED[@]}"; do | ||
| if [[ "$CURRENT_FILE" == "$changed" ]]; then | ||
| CAPTURE=true | ||
| break | ||
| fi | ||
| done | ||
| if $CAPTURE; then | ||
| FILTERED_OUTPUT+="${CURRENT_FILE}"$'\n' | ||
| fi | ||
| elif $CAPTURE && [[ $line =~ ^[[:space:]]*⎿ ]]; then |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name CAPTURE is used as a boolean flag to track whether to include output for the current file. This name could be more descriptive. Consider renaming to SHOULD_INCLUDE_FILE or IS_CHANGED_FILE to better convey its purpose and improve code readability.
| CAPTURE=false | |
| while IFS= read -r line; do | |
| # Check if line is a file path (file paths end with .mdx, .md, or .json) | |
| if [[ $line =~ ^[a-zA-Z0-9] ]] && ([[ $line == *.mdx ]] || [[ $line == *.md ]] || [[ $line == *.json ]]); then | |
| CURRENT_FILE="$line" | |
| # Check if this file is in changed files | |
| CAPTURE=false | |
| for changed in "${CHANGED[@]}"; do | |
| if [[ "$CURRENT_FILE" == "$changed" ]]; then | |
| CAPTURE=true | |
| break | |
| fi | |
| done | |
| if $CAPTURE; then | |
| FILTERED_OUTPUT+="${CURRENT_FILE}"$'\n' | |
| fi | |
| elif $CAPTURE && [[ $line =~ ^[[:space:]]*⎿ ]]; then | |
| SHOULD_INCLUDE_FILE=false | |
| while IFS= read -r line; do | |
| # Check if line is a file path (file paths end with .mdx, .md, or .json) | |
| if [[ $line =~ ^[a-zA-Z0-9] ]] && ([[ $line == *.mdx ]] || [[ $line == *.md ]] || [[ $line == *.json ]]); then | |
| CURRENT_FILE="$line" | |
| # Check if this file is in changed files | |
| SHOULD_INCLUDE_FILE=false | |
| for changed in "${CHANGED[@]}"; do | |
| if [[ "$CURRENT_FILE" == "$changed" ]]; then | |
| SHOULD_INCLUDE_FILE=true | |
| break | |
| fi | |
| done | |
| if $SHOULD_INCLUDE_FILE; then | |
| FILTERED_OUTPUT+="${CURRENT_FILE}"$'\n' | |
| fi | |
| elif $SHOULD_INCLUDE_FILE && [[ $line =~ ^[[:space:]]*⎿ ]]; then |
| CHANGED_FILES="changed-files.txt" | ||
|
|
||
| # Read changed files into array | ||
| mapfile -t CHANGED < "$CHANGED_FILES" |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script uses mapfile -t CHANGED < "$CHANGED_FILES" to read changed files into an array, but if the changed-files.txt is empty (no changed files), this will create an array with one empty string element. The subsequent comparison in the loop (line 91) will then try to match files against an empty string, which could cause issues. Consider checking if the file is empty first or handling the empty array case explicitly.
|
@copilot apply changes based on the comments in this thread |
…g, optimize filter script with associative array, use unique EOF delimiter, and rename CAPTURE to SHOULD_INCLUDE_FILE Co-authored-by: ayush-shah <[email protected]>
Co-authored-by: ayush-shah <[email protected]>
Applied all review feedback in commits 5fcf12e and 0f5eba7: Both Workflows:
PR Workflow:
All YAML files validated successfully. |
Implementation Plan for Automated Broken Link Checking Workflow
.github/workflowsdirectory structuremint broken-linkscommandmint broken-linkscommand|| trueand check exit code separatelyOriginal prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.