Skip to content

fix(build): fix mkdocs htmlproofer link validation flakiness#2095

Open
khushiiagrawal wants to merge 10 commits intooscal-compass:developfrom
khushiiagrawal:2032/flaky-errors
Open

fix(build): fix mkdocs htmlproofer link validation flakiness#2095
khushiiagrawal wants to merge 10 commits intooscal-compass:developfrom
khushiiagrawal:2032/flaky-errors

Conversation

@khushiiagrawal
Copy link

Types of changes

  • Hot fix (emergency fix and release)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Documentation (change which affects the documentation site)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Release (develop -> main)

Quality assurance (all should be covered).

  • My code follows the code style of this project.
  • Documentation for my change is up to date?
  • My PR meets testing requirements.
  • All new and existing tests passed.
  • All commits are signed-off.

Summary

Resolves #2032

The docs validate step in the CI pipeline is intermittently failing due to 503 Service Unavailable responses from valid external links.

This PR fixes the flakiness by configuring the mkdocs-htmlproofer-plugin to ignore 503 (Service Unavailable) and 504 (Gateway Timeout) HTTP status codes natively. Since these are temporary server-side issues rather than broken links in our documentation, explicitly excluding them from the failure checks stabilizes the CI without losing the benefits of validating 404s.

Signed-off-by: khushiiagrawal <khushisaritaagrawal@gmail.com>
@khushiiagrawal khushiiagrawal requested a review from a team as a code owner February 26, 2026 09:57
Copy link
Collaborator

@degenaro degenaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khushiiagrawal Thanks for this PR. It seems that 503 and 504 will now be ignored. Is there a warning about such URLs so that the PR reviewer can judge whether this of concern or not? Are there any retrys before the ignore/warning happens?

… for MkDocs link validation

Signed-off-by: khushiiagrawal <khushisaritaagrawal@gmail.com>
@khushiiagrawal
Copy link
Author

@khushiiagrawal Thanks for this PR. It seems that 503 and 504 will now be ignored. Is there a warning about such URLs so that the PR reviewer can judge whether this of concern or not? Are there any retrys before the ignore/warning happens?

@degenaro thanks for the review, these are valid points and worth addressing. i’ve looked into both.

retries: i’ve updated mkdocs.yml to add retry_max_times: 3 for the htmlproofer plugin. this allows urls returning transient errors like 503/504 to be retried up to three times before being treated as broken or ignored.

warnings: i also tried enabling warnings for ignored urls using warn_on_ignored_urls: true so they would appear in the ci logs. however, since the pipeline runs mkdocs build -s in strict mode, any warning is treated as an error and fails the build. because of that, emitting warnings for these urls would cause the job to fail, which defeats the purpose of ignoring them. given the current strict setup, we can’t surface them as non-fatal warnings in ci.

thanks.

@degenaro
Copy link
Collaborator

This will cause 503/504 to always be ignored in the github pipeline.

I don't suppose there's a way when running the Makefile locally on my laptop to not ignore 503/504, unless the yml file is hacked?

Do we know if the URLs currently in the ignore list still fail and if so is it because of 503/504?

Funny, I just enabled the pipelines to run and there was an htmlproofer failure with 502.

@butler54
Copy link
Contributor

This will cause 503/504 to always be ignored in the github pipeline.

I don't suppose there's a way when running the Makefile locally on my laptop to not ignore 503/504, unless the yml file is hacked?

Do we know if the URLs currently in the ignore list still fail and if so is it because of 503/504?

Funny, I just enabled the pipelines to run and there was an htmlproofer failure with 502.

I think we can do environmental variable overrides in the mkdocs.yml. You can see an example of those changes Here;
https://github.com/aucloud/aucloud.github.io/blob/0c9d2012825e87318e58294c299bc06d5393acb6/mkdocs.yml#L62-L67

  - htmlproofer:
      enabled: !ENV [ENABLED_HTMLPROOFER, False]
      validate_rendered_template: False
      validate_external_urls: False
      raise_error_after_finish: False
      raise_error: False

we'd need to chain that into the makefile.

@degenaro
Copy link
Collaborator

develop branch has been updated. @khushiiagrawal Please resolve conflicts.

@khushiiagrawal khushiiagrawal changed the title build: fix mkdocs htmlproofer link validation flakiness fix(build): fix mkdocs htmlproofer link validation flakiness Mar 1, 2026
Signed-off-by: khushiiagrawal <khushisaritaagrawal@gmail.com>
@khushiiagrawal
Copy link
Author

thanks @degenaro and @butler54 for the review. i've reworked the PR based on both your suggestions.

the htmlproofer config now uses !ENV overrides as butler54 suggested, so locally you can control the behavior without touching the yml:

  • ENABLED_HTMLPROOFER=false make docs-validate : skips htmlproofer entirely
  • HTMLPROOFER_VALIDATE_EXTERNAL_URLS=false make docs-validate : skips just external url checks

defaults are still true, so without setting anything you get full validation locally.

also added 502 to the exclusions since that was failing too, and kept retry_max_times: 3 so transient errors get retried before being excluded.

reverted the unrelated pyproject.toml and workflow changes to keep the PR focused.

@degenaro
Copy link
Collaborator

degenaro commented Mar 2, 2026

@khushiiagrawal Seems to still be flaky. Actually worse than flaky today. Cannot get lint pipeline to pass on other PRs either. Network must be particularly bad for some reason.

@khushiiagrawal
Copy link
Author

@degenaro Yeah, this is a broader network issue , I'm seeing it affect other PRs too. I'll switch the CI to HTMLPROOFER_VALIDATE_EXTERNAL_URLS=false so the lint job only validates internal links. External URL checks will still work locally by default for anyone who wants them. please let me know if that works .

@degenaro
Copy link
Collaborator

degenaro commented Mar 3, 2026

@khushiiagrawal In the end the website should have not have broken links. There may be times when these sites are not available for a variety of reasons and not much can be done about that.

We should not err on the side accepting broken links (external or internal) for the sake of expediency. The whole idea of checking is that we have not introduced a new link that is broken (or happen to come across an existing link that has since broken).

@degenaro
Copy link
Collaborator

degenaro commented Mar 3, 2026

@khushiiagrawal @butler54 @vikas-agarwal76 Here is what I was thinking. The htmlproofer would never block a PR. What would happen is:

  • a separate CI/CD pipeline yml would be run that does htmlproofing
  • a report would be created in the PR comprising which URL's failed (ideally none)
  • if any failed, it would be up to the PR author and reviewer(s) to decided if that is a blocking problem
  • if the pipeline is rerun, the report in the PR would be updated with passing URLs removed and failing URLs added if not already there (that is, the PR would not grow and grow with the same failures listed multiple times)
    What's different is the the merge is not blocked regardless of failures, but the failures are listed for consideration.

Before any coding is done, does this seem like a reasonable solution?

@degenaro
Copy link
Collaborator

The discussion at the community meeting resulted in favoring the "reasonable solution" approach above.

@khushiiagrawal
Copy link
Author

thanks @degenaro for confirming the approach at the community meeting.

implemented the agreed solution, htmlproofer now runs in a separate pipeline that never blocks merging. failing URLs get posted as a PR comment (reruns update the same comment, no duplicates). the main lint job no longer runs htmlproofer, so it stays stable. author and reviewers can decide if any failures are worth blocking on.
thankyou.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mkdocs linting is becoming flaky in CICD testing and may affect releases

4 participants