Skip to content

Conversation

@MichaIng
Copy link
Owner

The remote URL/website checker currently passes all URLs with fragments to the fragment checker as HTML document, even if it is a different or unsupported MIME type. This can cause false fragment checking for Markdown documents, failures for other MIME types, especially binaries, and unnecessary traffic for large downloads, which are always finished completely, if the fragment checker is invoked.

This commit checks the Content-Type header of the response:

  • Only if it is text/html, it is passed to the fragment checker as HTML type.
  • Only if it is text/markdown, of text/plain and URL path ends on .md, it is passed to the fragment checker as Markdown type.
  • In all other cases, the fragment checker is skipped and the HTTP status is returned.

To invoke the fragment checker with a variable document type, a new FileType argument is added to the check_html_fragment() function.

The fragment checker test and fixture are adjusted to match the expected result: checking a binary file via remote URL with fragment is now expected to succeed, since its Content-Type header does not invoke the fragment checker anymore.

@MichaIng MichaIng force-pushed the content-type-based-fragment-checking branch 13 times, most recently from 08b0ec2 to 83630cb Compare July 1, 2025 00:52
The remote URL/website checker currently passes all URLs with fragments to the fragment checker as HTML document, even if it is a different or unsupported MIME type. This can cause false fragment checking for Markdown documents, failures for other MIME types, especially binaries, and unnecessary traffic for large downloads, which are always finished completely, if the fragment checker is invoked.

This commit checks the Content-Type header of the response:
- Only if it is `text/html`, it is passed to the fragment checker as HTML type.
- Only if it is `text/markdown`, of `text/plain` and URL path ends on `.md`, it is passed to the fragment checker as Markdown type.
- In all other cases, the fragment checker is skipped and the HTTP status is returned.

To invoke the fragment checker with a variable document type, a new `FileType` argument is added to the `check_html_fragment()` function.

The fragment checker test and fixture are adjusted to match the expected result: checking a binary file via remote URL with fragment is now expected to succeed, since its Content-Type header does not invoke the fragment checker anymore.

Signed-off-by: MichaIng <[email protected]>
@MichaIng MichaIng force-pushed the content-type-based-fragment-checking branch from 83630cb to 34a933c Compare July 1, 2025 00:58
@MichaIng MichaIng closed this Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants