-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
linkcheck builder: optionally allow HTTP 401 status code hyperlinks to be reported broken #11684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
AA-Turner
merged 31 commits into
sphinx-doc:master
from
jayaddison:issue-11433/adjust-linkcheck-http-401-handling
Jan 9, 2024
Merged
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
6aee5b8
linkcheck builder: update handling of HTTP 401 status code to conside…
jayaddison 7f68206
linkcheck builder: introduce a 'linkcheck_allow_unauthorized' config …
jayaddison 2d4f4bc
Update CHANGES.rst
jayaddison a958e86
Clarify phrasing in CHANGES.rst: the setting has to be configured to …
jayaddison 712f060
CHANGES.rst: fixup: relocate the changelog entry to the correct locat…
jayaddison ee7348f
docs: add documentation for the 'linkcheck_allow_unauthorized' config…
jayaddison c19f7c3
CHANGES.rst: nitpick: undo accidental empty-line removal
jayaddison a290e3c
CHANGES.rst: nitpick: phrasing: 'handle...as broken' -> 'report...as …
jayaddison 46d8206
CHANGES.rst: nitpick: brevity
jayaddison c895ca2
linkcheck builder: add a deprecation warning indicating that the 'lin…
jayaddison 37b50ae
CHANGES.rst: fixup: add self-attribution
jayaddison bc3c390
Merge branch 'master' into issue-11433/adjust-linkcheck-http-401-hand…
jayaddison 37634b1
ruff: linting fixups
jayaddison a81f283
Apply code review suggestion: filter warnings in test case using more…
jayaddison de6ac23
Apply code review suggestion: make the 'allow_unauthorized' worker va…
jayaddison 9f87c41
docs: add removal note to 'linkcheck_allow_unauthorized' config optio…
jayaddison 2b90b76
linkcheck builder: relocate 'linkcheck_allow_unauthorized' warning to…
jayaddison d9144cc
fixup: use pytest.mark.filterwarnings to filter warning _before_ app …
jayaddison fa9be8a
cleanup: remove unused imports
jayaddison 4750345
Apply code review suggestion: set default value for 'allow_unauthoriz…
jayaddison 6d450c2
Updated plan: instead of removing the setting entirely in Sphinx 8.0,…
jayaddison cdfa342
Apply code review suggestion: when an HTTP 401 response is encountere…
jayaddison daf4efb
Code behaviour consensus: the URI of unauthorized HTTP responses shou…
jayaddison e17581a
CHANGES.rst: prefer Pythonic representation of false value
jayaddison 2070c88
Merge branch 'master' into issue-11433/adjust-linkcheck-http-401-hand…
jayaddison bc97be3
Merge branch 'master' into issue-11433/adjust-linkcheck-http-401-hand…
jayaddison a75abcf
Merge branch 'master' into issue-11433/adjust-linkcheck-http-401-hand…
jayaddison 5d16bb3
Merge branch 'master' into issue-11433/adjust-linkcheck-http-401-hand…
jayaddison 000af8b
Merge branch 'master' into issue-11433/adjust-linkcheck-http-401-hand…
jayaddison bfaf179
Merge branch 'master' into issue-11433/adjust-linkcheck-http-401-hand…
jayaddison 16a3695
Update sphinx/builders/linkcheck.py
AA-Turner File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the 401 still makes sense if people do not want to expose credentials in their configuration and only assume that something exists (and could return a 401).
As such, I think we could keep the
'working'but change a bit the way thatlinkcheck_authis handled:None, we keep'working'and do not emit warnings.'working', but we emit a warning saying that the content couldn't be accessed and that users should specify credentials (fake or not).That way, this also ensures that credentials are not exposed unintentionally and that the contents with 401 errors are detected correctly.
It's more of a follow-up, but what about using an enumeration (or at least mark the possible string statuses we returm using
Literal) for thestatusinstead of using'working','broken', etc?In
process_result, we use if-case but usingmatchinstead + enumerations may be more elegant and clearer (also we wouldn't have an arbitrary status code being emitted).Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. point understood, although I think that even when an URL has no linkcheck credentials configured (the "no fake credential" case), an HTTP 401 response should be considered a broken link, because it means that the link is unchecked. This is probably the core of the disruptive/controversial angle I have on this.
If for some reason a user is unable to provide credentials for some hyperlinks included in a documentation set, but wants a success report from linkchecking despite that, I'd argue they should use
linkcheck_anchors_ignorelinkcheck_ignoreto skip the relevant websites.I'm still thinking about the first point -- using
Noneas a special marker for intentionally-empty credentials. It seems a bit fragile and too-clever; I like simple and hard-to-misunderstand things, because I misunderstand a lot.👍
Mostly agree, although we still support Py3.9 at the moment - so this could be a future enhancement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me using
linkcheck_anchors_ignorealthough it's a bit counterintuitive in this case (like I don't really want to ignore).It was an example. We could have another configuration value like
linkcheck_auth_bypassand put what links we expect to have 401.Concerning the match, I forgot it was introduced in 3.10.
Actually, when writing my comment, I thought about a much more elegant solution, namely, you could specify the HTTP response code to expect for specific links and treat them as "working".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to make a convicing argument that it makes sense :) (I could be wrong! in which case that'll help too)
The HTTP 401 response is zero-information in terms of hyperlink validity. It does confirm that the client should use auth if it wants to gain more-than-zero information, but that's the only feedback it provides -- and we hide that from users at the moment.
So we're making network requests that fail, and we're not informing the user about it. I think we should start by informing the user -- and then they either choose to find auth to gain greater-than-zero info, or they ignore the links and reduce the outbound traffic (and linkchecking time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Actually, by counterintuitive, I meant that using the
linkcheck_anchors_ignorewas weird to me since I would have used it to suppress errors related to anchors rather than to HTTP response codes.So, we could keep what you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agh, sorry. I should have written
linkcheck_ignoreinstead. I completely failed to notice/parse the wordanchorin there.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it's ok for me !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) thank you!
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on some refactoring for this at the moment; the
CheckResultclass relies upon being JSON-serializable at the moment, and I don't feel like introducing custom serialization code, so I've opted to use theStrEnumtype --Enumwith string values isn't serializable by default.Taking that approach adds a dependency on Py3.11, so I'm experimenting with adding
matchstatements at the same time.I did read up a little bit about using
Literal- it looks like it's mostly a type-checking aid? I think I'd prefer the enum route, even if it may take a while before it could land in the codebase. Maybe I'll change my mind as I learn more, though..