Add script to detect external self-references in docs #765

Edu92337 · 2026-01-18T03:18:44Z

Description

What is this PR

Addition of a new feature

Why is this PR needed?

This PR addresses issue #645. Currently, when the movement documentation site (https://movement.neuroinformatics.dev/) is offline and needs to be redeployed, the Sphinx linkcheck tool fails because some documentation files contain external URLs referencing the movement site itself. This creates a circular dependency that prevents the docs from being rebuilt.

What does this PR do?

This PR adds automated detection of external self-references in documentation files. It creates:

A Python script (docs/check_self_references.py) that scans documentation files for external movement URLs and suggests using internal MyST cross-references instead
Integration with the project's Makefile and pre-commit hooks to catch violations early
Comprehensive unit tests to ensure the detection logic works correctly

The script intelligently allowlists legitimate external URLs (like the version switcher JSON) while flagging URLs that should use internal cross-references (e.g., target-installation instead of https://movement.neuroinformatics.dev/latest/user_guide/installation.html).

References

Closes #645

Builds upon the work done in PR #642 which replaced some external links with internal targets.

How has this PR been tested?

Manual Testing:

Ran python docs/check_self_references.py on current documentation - no violations found
Verified allowlist correctly excludes version switcher URLs in conf.py and workflow files
Confirmed target suggestions work for all mapped URLs
Tested that README.md is correctly excluded (it intentionally uses external URLs for GitHub display)
Created test files with violations to verify detection works
Verified Makefile integration: cd docs && make selfcheck

Unit Tests:

Created comprehensive unit tests in tests/test_unit/test_docs/test_check_self_references.py:

TestFindSelfReferences - Detection of URLs in files, multiple URLs, anchors, line numbers
TestIsAllowed - Allowlist patterns for switcher JSON and regular URLs
TestSuggestTarget - URL-to-target mapping for known and unknown URLs
TestIntegration - Clean files and mixed content scenarios

Note: Unit tests were validated manually due to environment constraints. They will run properly in CI where project dependencies are installed.

Pre-commit Integration:

Added local hook configuration in .pre-commit-config.yaml
Hook triggers on .md and .rst files in docs/source/

Is this a breaking change?

No, this PR does not break any existing functionality. It only adds a new validation tool that can be run optionally via:

python docs/check_self_references.py (direct execution)
make selfcheck (via Makefile)
pre-commit run check-self-references (via pre-commit)

The current documentation already passes all checks.

Does this PR require an update to the documentation?

No documentation update is required because:

This is a developer-facing tool, not a user-facing feature
The tool is self-documenting with clear error messages
Usage is straightforward and follows existing project patterns (similar to make_api.py)

However, developers can discover the tool via:

The selfcheck target in docs/Makefile
The pre-commit hook (when they run pre-commit install)
This PR description and the code comments

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes (N/A - tool is self-documenting)
The code has been formatted with pre-commit

Additional Notes

Design Decisions:

Script location: Placed in docs/ to follow the pattern of existing documentation tools (make_api.py, convert_admonitions.py)
README.md exclusion: The README is intentionally not checked because it's displayed on GitHub where MyST cross-references don't work - external URLs are necessary there
Allowlist approach: Version switcher URLs in conf.py and workflow files are legitimate and should not trigger violations
Direct file scanning vs. linkcheck parsing: Direct regex scanning is simpler, faster, and doesn't require linkcheck to have run first

Future Enhancements:

This tool could potentially be generalized and moved to the neuroinformatics-unit actions repository for use across multiple NIU projects (as mentioned in the original issue).

for more information, see https://pre-commit.ci

sonarqubecloud · 2026-01-20T12:08:57Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

codecov · 2026-01-20T13:04:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (665412f) to head (c5e6932).

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #765   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           34        34           
  Lines         2111      2111           
=========================================
  Hits          2111      2111

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Edu92337 and others added 4 commits January 18, 2026 00:08

Add script to detect external self-references in docs

5ca5e83

[pre-commit.ci] auto fixes from pre-commit.com hooks

74f84b7

for more information, see https://pre-commit.ci

pre-commit changes

f6efcf3

Fix: docstring line length for ruff compliance

f4a4ebb

niksirbi mentioned this pull request Jan 20, 2026

Code snippets #759

Open

7 tasks

Merge remote-tracking branch 'upstream/main' into detect_reference

c5e6932

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add script to detect external self-references in docs #765

Add script to detect external self-references in docs #765

Uh oh!

Edu92337 commented Jan 18, 2026

Uh oh!

sonarqubecloud bot commented Jan 20, 2026

Uh oh!

codecov bot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add script to detect external self-references in docs #765

Are you sure you want to change the base?

Add script to detect external self-references in docs #765

Uh oh!

Conversation

Edu92337 commented Jan 18, 2026

Description

References

How has this PR been tested?

Is this a breaking change?

Does this PR require an update to the documentation?

Checklist:

Additional Notes

Uh oh!

sonarqubecloud bot commented Jan 20, 2026

Quality Gate passed

Uh oh!

codecov bot commented Jan 20, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant