[py]: guess mimetypes in webserver for content serving #16449

navin772 · 2025-10-16T17:27:43Z

User description

🔗 Related Issues

In here - #16419 (comment), the txt and image files were getting downloaded as html. The webserver wasn't handling the mimetypes, this PR fixes it.

💥 What does this PR do?

🔧 Implementation Notes

I have used filetype for mimetype guessing but it isn't very good for text file mime guessing since it works on magic bytes of files. We are already using filetype and it was added in this PR - #14771 as per discussion in #14765 (comment).

We can also use the mimetypes library.

💡 Additional Considerations

🔄 Types of changes

Bug fix (backwards compatible)

PR Type

Bug fix

Description

Add proper MIME type detection using mimetypes library for file serving
Handle binary files (images, audio, video) separately from text files
Remove workarounds in download tests that accepted incorrect file extensions
Simplify test assertions to expect correct file types

Diagram Walkthrough

flowchart LR
  A["Webserver File Handler"] -->|"guess_type()"| B["MIME Type Detection"]
  B -->|"Binary Content"| C["Read as Binary"]
  B -->|"Text Content"| D["Read as Text + Encode"]
  C -->|"Correct Extension"| E["File Downloaded Correctly"]
  D -->|"Correct Extension"| E
  F["Test Assertions"] -->|"Remove Workarounds"| G["Expect Correct File Types"]

File Walkthrough

Relevant files

Bug fix

webserver.py `Implement proper MIME type detection for file serving` py/test/selenium/webdriver/common/webserver.py Import `mimetypes` module for MIME type detection Refactor `_serve_file()` to handle binary and text files separately based on MIME type Update `do_GET()` to use `mimetypes.guess_type()` instead of hardcoded `.json` check Binary files (images, audio, video, applications) are read in binary mode; text files use latin-1 encoding	+16/-3

Tests

remote_downloads_tests.py `Remove download test workarounds for correct file types` py/test/selenium/webdriver/remote/remote_downloads_tests.py Remove TODO comments about Chrome downloading files as HTML Simplify `test_get_downloadable_files()` to assert exact file names instead of accepting multiple extensions Simplify `test_download_file()` to search for `.txt` files directly without fallback extensions Update `_browser_downloads()` wait condition to expect correct `file_2.jpg` instead of multiple extensions Reduce wait timeout from 5 to 3 seconds	+6/-16

…ontent-type-webserver

qodo-merge-pro · 2025-10-16T17:28:19Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance

🟢

No security concerns identified

No security vulnerabilities detected by AI analysis. Human verification advised for critical code.

Ticket Compliance

🟡

🎫 #1234

🔴	Ensure clicking links with JavaScript in href triggers expected behavior in Firefox 42 similar to Selenium 2.47.1.
	Provide a fix or regression handling for versions 2.48.0/2.48.2 where alerts no longer show.

🟡

🎫 #14765

🟢	Ensure tests continue to validate image/content types without imghdr, potentially via alternative libraries or methods.
🔴	Remove or replace usage of deprecated imghdr module in tests to be compatible with Python 3.13+.

🟡

🎫 #5678

🔴	Investigate and resolve "ConnectFailure (Connection refused)" errors when instantiating multiple ChromeDriver instances on Ubuntu.
	Provide guidance or code changes to prevent connection issues after the first ChromeDriver instance.

Codebase Duplication Compliance

⚪

Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance

⚪

No custom compliance provided

Follow the guide to enable custom compliance check.

Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-merge-pro · 2025-10-16T17:29:44Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	✅ ~~Simplify file serving by reading all files in binary mode~~ Suggestion Impact: The commit changed _serve_file to always open files in binary mode and return the raw bytes, removing the previous latin-1 text re-encoding path. It also added MIME type detection with filetype and simple fallbacks. code diff: def _serve_file(self, file_path): """Serve a file from the HTML root directory.""" - content_type, _ = mimetypes.guess_type(file_path) - - if content_type and ( - content_type.startswith("image/") - or content_type.startswith("application/") - or content_type.startswith("video/") - or content_type.startswith("audio/") - ): - with open(file_path, "rb") as f: - return f.read() + with open(file_path, "rb") as f: + content = f.read() + + kind = filetype.guess(content) + if kind is not None: + return content, kind.mime + + # fallback for text files that filetype can't detect + if file_path.endswith(".txt"): + return content, "text/plain" + elif file_path.endswith(".json"): + return content, "application/json" else: - # text files - with open(file_path, encoding="latin-1") as f: - return f.read().encode("utf-8") + return content, "text/html" Simplify the `_serve_file` method by opening all files in binary mode (`"rb"`) to avoid unnecessary and potentially corrupting text re-encoding. py/test/selenium/webdriver/common/webserver.py [72-87] def _serve_file(self, file_path): """Serve a file from the HTML root directory.""" - content_type, _ = mimetypes.guess_type(file_path) + with open(file_path, "rb") as f: + return f.read() - if content_type and ( - content_type.startswith("image/") - or content_type.startswith("application/") - or content_type.startswith("video/") - or content_type.startswith("audio/") - ): - with open(file_path, "rb") as f: - return f.read() - else: - # text files - with open(file_path, encoding="latin-1") as f: - return f.read().encode("utf-8") - `[Suggestion processed]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that re-encoding text files from `latin-1` to `utf-8` is problematic and can cause corruption. The proposed change to read all files in binary mode is simpler, more robust, and the correct approach for a web server.	Medium
Learned best practice	✅ ~~Default Content-Type when unknown~~ Suggestion Impact: The commit refactored content-type detection to return both content and MIME type from _serve_file using filetype.guess, and added explicit fallbacks (e.g., text/plain, application/json, text/html). This ensures a valid Content-Type is always set even when guessing fails, aligning with the suggestion's intent. code diff: def _serve_file(self, file_path): """Serve a file from the HTML root directory.""" - content_type, _ = mimetypes.guess_type(file_path) - - if content_type and ( - content_type.startswith("image/") - or content_type.startswith("application/") - or content_type.startswith("video/") - or content_type.startswith("audio/") - ): - with open(file_path, "rb") as f: - return f.read() + with open(file_path, "rb") as f: + content = f.read() + + kind = filetype.guess(content) + if kind is not None: + return content, kind.mime + + # fallback for text files that filetype can't detect + if file_path.endswith(".txt"): + return content, "text/plain" + elif file_path.endswith(".json"): + return content, "application/json" else: - # text files - with open(file_path, encoding="latin-1") as f: - return f.read().encode("utf-8") + return content, "text/html" def _send_response(self, content_type="text/html"): """Send a response.""" @@ -102,8 +103,7 @@ self._send_response("text/html") self.wfile.write(html) elif os.path.isfile(file_path): - content_type, _ = mimetypes.guess_type(file_path) - content = self._serve_file(file_path) + content, content_type = self._serve_file(file_path) self._send_response(content_type) self.wfile.write(content) Ensure a valid Content-Type is always sent by defaulting to a sensible MIME type when guessing fails. This avoids sending a None header value. py/test/selenium/webdriver/common/webserver.py [104-108] elif os.path.isfile(file_path): content_type, _ = mimetypes.guess_type(file_path) + if not content_type: + content_type = "application/octet-stream" content = self._serve_file(file_path) self._send_response(content_type) self.wfile.write(content) `[Suggestion processed]` Suggestion importance[1-10]: 6 __ Why: Relevant best practice - Validate inputs and states early, providing safe defaults to prevent logic errors and None-related issues.	Low
Update

cgoldberg · 2025-10-16T17:46:02Z

good find! thanks for fixing the tests also.

navin772 added 3 commits October 16, 2025 21:51

use mimetypes in webserver for content type

b2e14e9

fix file serving

de49ff0

Merge branch 'trunk' of https://github.com/SeleniumHQ/selenium into c…

0c72ea1

…ontent-type-webserver

selenium-ci added the C-py Python Bindings label Oct 16, 2025

use filetype

0cb52a1

qodo-merge-pro bot added the Review effort 2/5 label Oct 16, 2025

navin772 mentioned this pull request Oct 16, 2025

[py] Use Chrome instead of Firefox for internal Remote tests #16419

Merged

navin772 requested a review from cgoldberg October 16, 2025 17:37

cgoldberg approved these changes Oct 16, 2025

View reviewed changes

Merge branch 'trunk' into content-type-webserver

fa68fca

navin772 merged commit 8a91342 into SeleniumHQ:trunk Oct 17, 2025
4 checks passed

navin772 deleted the content-type-webserver branch October 17, 2025 04:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[py]: guess mimetypes in webserver for content serving #16449

[py]: guess mimetypes in webserver for content serving #16449

navin772 commented Oct 16, 2025 •

edited by qodo-merge-pro bot

Loading

Uh oh!

qodo-merge-pro bot commented Oct 16, 2025 •

edited

Loading

Uh oh!

qodo-merge-pro bot commented Oct 16, 2025 •

edited

Loading

Uh oh!

cgoldberg commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[py]: guess mimetypes in webserver for content serving #16449

[py]: guess mimetypes in webserver for content serving #16449

Conversation

navin772 commented Oct 16, 2025 • edited by qodo-merge-pro bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

🔗 Related Issues

💥 What does this PR do?

🔧 Implementation Notes

💡 Additional Considerations

🔄 Types of changes

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-merge-pro bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

Uh oh!

qodo-merge-pro bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Uh oh!

cgoldberg commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

navin772 commented Oct 16, 2025 •

edited by qodo-merge-pro bot

Loading

qodo-merge-pro bot commented Oct 16, 2025 •

edited

Loading

qodo-merge-pro bot commented Oct 16, 2025 •

edited

Loading