-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
add linkcheck_allow_forbidden
option
#13860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
add linkcheck_allow_forbidden
option
#13860
Conversation
…nx-doc#13483) Co-authored-by: Adam Turner <[email protected]>
I don't think that we should implement this, because my understanding of HTTP 403 response statuses is that they don't indicate whether a resource exists or not, and even if a server is misconfigured in some way that does provide existence checks, HTTP 4xx codes all represent errors. Even so: if we do go ahead with something like this, then I think the request was to make the list of accepted HTTP status codes for a given URL pattern configurable, and I don't think that this PR does that yet. |
@fmigneault I think this is extremely nuanced and the rules will diverge per user and even per domain, we could benefit more from examples on how to do user-side extensions to modify the default behavior and adjust the source code to allow that, if needed, which IMO should be kept simple. |
The same could be said about 401, yet this one is supported. There are many servers that don't respect correct HTTP codes. An increasing number of HTTP 403 gets thrown back by rate limiting in attempts to block checks (even if not the correct code). This causes certain pipelines to fail sporadically, and it is extremely annoying. The next option is to completely ignore the links, which is bad since real 404 won't be caught. The problem is not that certain site always return 4xx, but that they sometimes do it for unrelated reasons. Unless an actual 404 is returned, I personally prefer to ignore these errors temporarily. I agree that having per-site sets of HTTP codes to allow is better. Is there anything like this already in place? It seems that would be an entirely new feature, irrespective of the specific HTTP code. |
How about merging linkcheck_allow_forbidden and linkcheck_allow_unauthorized as a list linkcheck_allow that takes the HTTP error codes that should be ignored? On
and
Not that I am aware of. If you own the resource, linkcheck_auth / linkcheck_request_headers should be able to pass an API key to bypass the ratelimit. Still, linkcheck_auth + linkcheck_request_headers + linkcheck_rate_limit_timeout + linkcheck_retries + linkcheck_timeout are already so many options that maybe if you still get rate limited after tuning, it may be a valid failure point. |
Yes. Sounds good.
That's the thing. I don't own it so auth doesn't apply (and it is open access anyway). I am already using these options, but some servers just decide to misbehave anyway. |
Purpose
Add
linkcheck_allow_forbidden
option to let 403 be marked as "working".Defining it as a properly configurable option, as requested: #9762 (comment)
References