Skip to content

Conversation

@asolntsev
Copy link
Contributor

@asolntsev asolntsev commented Nov 22, 2025

User description

🔗 Related Issues

Fixes #16612

💥 What does this PR do?

This PR allows downloading very large files (I tried up to 4GB) from Selenium Grid, while consuming very low memory.

🔄 Types of changes

  • Performance improvement (backwards compatible)

PR Type

Enhancement, Bug fix


Description

  • Add new Grid endpoint /se/files/:name for direct file downloads without Base64 encoding

  • Optimize RemoteWebDriver.downloadFile() to stream files directly instead of loading into memory

  • Extract anonymous Contents.Supplier implementations into separate named classes for better debugging

  • Support downloading large files (tested up to 4GB) with minimal memory consumption

  • Add URL decoding support and improve file download error messages with available files list


Diagram Walkthrough

flowchart LR
  A["Client requests file download"] -->|GET /se/files/:name| B["New Grid endpoint"]
  B -->|Direct streaming| C["FileContentSupplier"]
  C -->|Stream to disk| D["Target file location"]
  E["Old approach"] -->|Base64 encoding| F["JSON response"]
  F -->|High memory usage| G["OutOfMemory risk"]
  style B fill:#90EE90
  style C fill:#90EE90
  style D fill:#90EE90
Loading

File Walkthrough

Relevant files
Enhancement
19 files
Session.java
Add toString method for better debugging                                 
+5/-0     
Node.java
Add new GET endpoint for file downloads                                   
+3/-0     
LocalNode.java
Implement streaming file download with URL decoding           
+77/-16 
Urls.java
Add URL decoding utility method                                                   
+8/-2     
FileBackedOutputStreamContentSupplier.java
Extract anonymous Contents.Supplier implementation             
+74/-0   
RequestConverter.java
Use FileBackedOutputStreamContentSupplier and thread-safe length
tracking
+6/-31   
DriverCommand.java
Add new GET_DOWNLOADED_FILE command constant                         
+1/-0     
RemoteWebDriver.java
Stream file downloads directly without Base64 encoding     
+10/-4   
AbstractHttpCommandCodec.java
Map new GET_DOWNLOADED_FILE command to endpoint                   
+2/-0     
W3CHttpResponseCodec.java
Handle binary OCTET_STREAM responses and avoid logging large content
+15/-6   
BytesContentSupplier.java
Extract anonymous bytes supplier implementation                   
+65/-0   
Contents.java
Change length to long, add file and stream suppliers, deprecate unsafe
methods
+26/-26 
FileContentSupplier.java
New supplier for streaming file content with size metadata
+80/-0   
HttpMessage.java
Add toString and contentAsString methods                                 
+9/-0     
HttpRequest.java
Improve toString to include parent content information     
+2/-1     
HttpResponse.java
Use parent toString to avoid loading large content             
+1/-2     
InputStreamContentSupplier.java
New supplier for streaming input with size limit protection
+71/-0   
JdkHttpClient.java
Use InputStream handler instead of byte array for responses
+3/-3     
JdkHttpMessages.java
Create response with streaming content from InputStream   
+4/-5     
Tests
1 files
LocalNodeTest.java
Add test for file name extraction from URI                             
+7/-0     
Dependencies
1 files
BUILD.bazel
Add jspecify dependency for annotations                                   
+1/-0     

@selenium-ci selenium-ci added B-grid Everything grid and server related C-java Java Bindings B-build Includes scripting, bazel and CI integrations labels Nov 22, 2025
@asolntsev asolntsev added the I-performance Something could be faster label Nov 22, 2025
@qodo-merge-pro
Copy link
Contributor

qodo-merge-pro bot commented Nov 22, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Path parsing ambiguity

Description: The URL-decoding in extractFileName replaces spaces with '+' (urlDecode(...).replace(' ',
'+')), which can be abused to bypass exact filename matching and access unexpected files
whose real names contain spaces and plus signs, potentially enabling ambiguous or
unintended file access.
LocalNode.java [771-783]

Referred Code
private String extractFileName(HttpRequest req) {
  return extractFileName(req.getUri());
}

String extractFileName(String uri) {
  String prefix = "/se/files/";
  int index = uri.lastIndexOf(prefix);
  if (index < 0) {
    throw new IllegalArgumentException("Unexpected URL for downloading a file: " + uri);
  }
  return urlDecode(uri.substring(index + prefix.length())).replace(' ', '+');
}
Insecure direct object access

Description: The file download endpoint streams arbitrary files from the per-session downloads
directory based solely on client-supplied names without enforcing content-type
restrictions or download safeguards, allowing retrieval of any file present in that
directory (including potentially sensitive artifacts) if an attacker can write to it via
the browser.
LocalNode.java [839-851]

Referred Code
private HttpResponse getDownloadedFile(File downloadsDirectory, String fileName)
    throws IOException {
  if (fileName.isEmpty()) {
    throw new WebDriverException("Please specify file to download in URL");
  }
  File file = findDownloadedFile(downloadsDirectory, fileName);
  BasicFileAttributes attributes = readAttributes(file.toPath(), BasicFileAttributes.class);
  return new HttpResponse()
      .setHeader("Content-Type", MediaType.OCTET_STREAM.toString())
      .setHeader("Content-Length", String.valueOf(attributes.size()))
      .setHeader("Last-Modified", lastModifiedHeader(attributes.lastModifiedTime()))
      .setContent(Contents.file(file));
}
Memory exhaustion risk

Description: contentAsString reads the entire stream into memory if length ≤ 256MB, which could still
be abused to trigger high memory usage by returning large "text" responses near the
threshold; this is risky for untrusted endpoints and contradicts the streaming intent.
InputStreamContentSupplier.java [61-69]

Referred Code
public String contentAsString(Charset charset) {
  if (length > MAX_TEXT_RESPONSE_SIZE) {
    throw new UnsupportedOperationException("Cannot print out too large stream content");
  }
  try {
    return new String(stream.readAllBytes(), UTF_8);
  } catch (IOException e) {
    throw new RuntimeException(e);
  }
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Info Exposure: Error when file not found exposes absolute downloads directory path and full file list in
the message, which may leak internal system details to the client.

Referred Code
  throw new WebDriverException(
      String.format(
          "Cannot find file [%s] in directory %s. Found %s files: %s.",
          filename, downloadsDirectory.getAbsolutePath(), files.size(), files));
}
if (matchingFiles.size() != 1) {
  throw new WebDriverException(
      String.format(

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Action Logging: New download endpoints and file operations (listing, streaming, deleting) do not include
explicit audit logging of user/session, action, target file, and outcome in the added code
hunks.

Referred Code
          + id
          + " — ensure downloads are enabled in the options class when requesting a session.";
  throw new WebDriverException(msg);
}
File downloadsDirectory =
    Optional.ofNullable(tempFS.getBaseDir().listFiles()).orElse(new File[] {})[0];

try {
  if (req.getMethod().equals(HttpMethod.GET) && req.getUri().endsWith("/se/files")) {
    return listDownloadedFiles(downloadsDirectory);
  }
  if (req.getMethod().equals(HttpMethod.GET)) {
    return getDownloadedFile(downloadsDirectory, extractFileName(req));
  }
  if (req.getMethod().equals(HttpMethod.DELETE)) {
    return deleteDownloadedFile(downloadsDirectory);
  }
  return getDownloadedFile(req, downloadsDirectory);
} catch (IOException e) {
  throw new UncheckedIOException(e);
}


 ... (clipped 117 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Edge Cases: Streaming download path sets Content-Length from attributes but may not handle missing
length or concurrent file changes, and GET without file name relies on URL parsing that
throws IllegalArgumentException without standardized error response mapping.

Referred Code
    throws IOException {
  if (fileName.isEmpty()) {
    throw new WebDriverException("Please specify file to download in URL");
  }
  File file = findDownloadedFile(downloadsDirectory, fileName);
  BasicFileAttributes attributes = readAttributes(file.toPath(), BasicFileAttributes.class);
  return new HttpResponse()
      .setHeader("Content-Type", MediaType.OCTET_STREAM.toString())
      .setHeader("Content-Length", String.valueOf(attributes.size()))
      .setHeader("Last-Modified", lastModifiedHeader(attributes.lastModifiedTime()))
      .setContent(Contents.file(file));
}

private String lastModifiedHeader(FileTime fileTime) {
  return HTTP_DATE_FORMAT.format(fileTime.toInstant().atZone(UTC));
}

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Path Handling: Filename extraction and matching rely on equals against names found in the downloads
directory, but there is no explicit normalization/validation beyond urlDecode to prevent
traversal or special-name attacks, requiring verification of directory confinement.

Referred Code
String extractFileName(String uri) {
  String prefix = "/se/files/";
  int index = uri.lastIndexOf(prefix);
  if (index < 0) {
    throw new IllegalArgumentException("Unexpected URL for downloading a file: " + uri);
  }
  return urlDecode(uri.substring(index + prefix.length())).replace(' ', '+');
}

/** User wants to list files that can be downloaded */
private HttpResponse listDownloadedFiles(File downloadsDirectory) {
  File[] files = Optional.ofNullable(downloadsDirectory.listFiles()).orElse(new File[] {});
  List<String> fileNames = Arrays.stream(files).map(File::getName).collect(Collectors.toList());
  List<DownloadedFile> fileInfos =
      Arrays.stream(files).map(this::getFileInfo).collect(Collectors.toList());

  Map<String, Object> data =
      Map.of(
          "names", fileNames,
          "files", fileInfos);
  Map<String, Map<String, Object>> result = Map.of("value", data);


 ... (clipped 88 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-merge-pro
Copy link
Contributor

qodo-merge-pro bot commented Nov 22, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect filename extraction logic

Remove the incorrect .replace(' ', '+') from the extractFileName method to
correctly handle filenames containing spaces.

java/src/org/openqa/selenium/grid/node/local/LocalNode.java [775-782]

 String extractFileName(String uri) {
   String prefix = "/se/files/";
   int index = uri.lastIndexOf(prefix);
   if (index < 0) {
     throw new IllegalArgumentException("Unexpected URL for downloading a file: " + uri);
   }
-  return urlDecode(uri.substring(index + prefix.length())).replace(' ', '+');
+  return urlDecode(uri.substring(index + prefix.length()));
 }
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a bug in the new extractFileName method where spaces in filenames are incorrectly converted to plus signs, which would cause file downloads to fail for such files.

High
Allow multiple stream reads from file

Update FileContentSupplier.get() to return a new InputStream on each call to
adhere to the Contents.Supplier contract, removing the single-use restriction.

java/src/org/openqa/selenium/remote/http/FileContentSupplier.java [38-50]

 @Override
 public synchronized InputStream get() {
-  if (inputStream != null) {
-    throw new IllegalStateException("File input stream has been opened before");
-  }
   try {
-    inputStream = Files.newInputStream(file.toPath());
+    return Files.newInputStream(file.toPath());
   } catch (IOException e) {
     throw new IllegalStateException("File not readable: " + file.getAbsolutePath(), e);
   }
-
-  return inputStream;
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies that the get() method in the new FileContentSupplier class violates the Contents.Supplier contract by not allowing multiple invocations, which could cause issues with features like HTTP retries.

Medium
Overwrite existing files on download

Modify the downloadFile method to use StandardCopyOption.REPLACE_EXISTING with
Files.copy to allow overwriting existing files.

java/src/org/openqa/selenium/remote/RemoteWebDriver.java [729-739]

 @Override
 public void downloadFile(String fileName, Path targetLocation) throws IOException {
   requireDownloadsEnabled(capabilities);
 
   Response response = execute(DriverCommand.GET_DOWNLOADED_FILE, Map.of("name", fileName));
 
   Contents.Supplier content = (Contents.Supplier) response.getValue();
   try (InputStream fileContent = content.get()) {
-    Files.copy(new BufferedInputStream(fileContent), targetLocation.resolve(fileName));
+    Files.copy(
+        new BufferedInputStream(fileContent),
+        targetLocation.resolve(fileName),
+        StandardCopyOption.REPLACE_EXISTING);
   }
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that Files.copy will fail if the target file exists and proposes adding StandardCopyOption.REPLACE_EXISTING to fix it, which is a valid and useful improvement.

Medium
Learned
best practice
Validate filename to prevent traversal

Validate the extracted name against path traversal and reject names containing
separators or parent segments; avoid mutating decoded names unexpectedly.

java/src/org/openqa/selenium/grid/node/local/LocalNode.java [775-851]

 String extractFileName(String uri) {
   String prefix = "/se/files/";
   int index = uri.lastIndexOf(prefix);
   if (index < 0) {
     throw new IllegalArgumentException("Unexpected URL for downloading a file: " + uri);
   }
-  return urlDecode(uri.substring(index + prefix.length())).replace(' ', '+');
+  String decoded = urlDecode(uri.substring(index + prefix.length()));
+  // reject path traversal or nested paths
+  if (decoded.contains("/") || decoded.contains("\\") || decoded.contains("..")) {
+    throw new IllegalArgumentException("Invalid file name");
+  }
+  return decoded;
 }
 
 private HttpResponse getDownloadedFile(File downloadsDirectory, String fileName)
     throws IOException {
-  if (fileName.isEmpty()) {
+  if (fileName == null || fileName.isBlank()) {
     throw new WebDriverException("Please specify file to download in URL");
   }
+  // rest unchanged
   File file = findDownloadedFile(downloadsDirectory, fileName);
   BasicFileAttributes attributes = readAttributes(file.toPath(), BasicFileAttributes.class);
   return new HttpResponse()
       .setHeader("Content-Type", MediaType.OCTET_STREAM.toString())
       .setHeader("Content-Length", String.valueOf(attributes.size()))
       .setHeader("Last-Modified", lastModifiedHeader(attributes.lastModifiedTime()))
       .setContent(Contents.file(file));
 }
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Guard external I/O and user-supplied inputs with validation to avoid crashes or security issues; validate and sanitize path-derived file names and method-specific routes.

Low
  • Update

@asolntsev asolntsev self-assigned this Nov 22, 2025
@asolntsev asolntsev force-pushed the 16612-download-large-files branch from e993ed2 to f42d85e Compare November 22, 2025 21:08
I added a new Grid endpoint "/se/files/:name" which allows downloading the file directly, without encoding it to Base64 and adding to Json. This transformation kills the performance and causes OutOfMemory errors for large files (e.g. 256+ MB).

NB! Be sure that `toString()` method of objects (HttpRequest, HttpResponse, Contents.Supplier) never returns too long string - it spam debug logs and can cause OOM during debugging.
…ier` to separate classes

It makes debugging easier. You can easily see what instances they are and where they come from.
Instead of reading the whole file to a byte array, just save given InputStream directly to the file.

Now it can download large files (I tried 4GB) while consuming very low memory.
… deleted

After stopping a Grid node, the folder is deleted asynchronously (by cache removal listener). So we need to wait for it in test.
…opped

At least on my machine, stopping the node takes some time, and any checks right after `node.stop(sessionId)` often can fail.
@asolntsev asolntsev force-pushed the 16612-download-large-files branch from f42d85e to 2c190cc Compare November 23, 2025 12:26
@asolntsev asolntsev marked this pull request as draft November 23, 2025 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B-build Includes scripting, bazel and CI integrations B-grid Everything grid and server related C-java Java Bindings I-performance Something could be faster Review effort 3/5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[🐛 Bug]: Critically slow file transfer from Node to host for large files

2 participants