Skip to content

Conversation

@lcian
Copy link
Member

@lcian lcian commented Nov 26, 2025

No description provided.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 26, 2025
@github-actions github-actions bot added the Scope: Frontend Automatically applied to PRs that change frontend components label Nov 26, 2025
@github-actions
Copy link
Contributor

🚨 Warning: This pull request contains Frontend and Backend changes!

It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently.

Have questions? Please ask in the #discuss-dev-infra channel.

@lcian
Copy link
Member Author

lcian commented Nov 26, 2025

@sentry review

self._current_chunk_remaining -= len(chunk)

if self._current_chunk_remaining == 0:
self._read(2) # Read trailing \r\n
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: ChunkedStreamDecoder ignores the return value of self._read(2) for \r\n consumption, corrupting chunk parsing on incomplete reads.
Severity: HIGH | Confidence: High

🔍 Detailed Analysis

The ChunkedStreamDecoder calls self._read(2) to consume the trailing \r\n after each chunk. However, the WSGI wsgi.input stream's read(size) method does not guarantee returning exactly size bytes. If _read(2) returns fewer than 2 bytes, the decoder ignores this, and its state is not advanced correctly. This causes subsequent chunk size parsing to become corrupted, leading to a ValueError on line 142 and stream decoding failures.

💡 Suggested Fix

The _read(2) call should verify that exactly 2 bytes were read. If fewer than 2 bytes are returned, the decoder should handle the incomplete read, possibly by retrying or raising an error, to ensure its state is correctly advanced.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/objectstore/endpoints/organization.py#L129

Potential issue: The `ChunkedStreamDecoder` calls `self._read(2)` to consume the
trailing `\r\n` after each chunk. However, the WSGI `wsgi.input` stream's `read(size)`
method does not guarantee returning exactly `size` bytes. If `_read(2)` returns fewer
than 2 bytes, the decoder ignores this, and its state is not advanced correctly. This
causes subsequent chunk size parsing to become corrupted, leading to a `ValueError` on
line 142 and stream decoding failures.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 3799736

) -> Response | StreamingHttpResponse:
target_base_url = options.get("objectstore.config")["base_url"].rstrip("/")
target_url = urljoin(target_base_url, path)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The urljoin function may not behave as expected when joining paths. According to Python's documentation, urljoin('http://localhost:8888/', 'health') will correctly return http://localhost:8888/health, but urljoin('http://localhost:8888/', '/health') would replace the path entirely. If a path starts with '/', it will be treated as an absolute path. Consider using a more robust path joining approach that handles both leading and non-leading slash cases, similar to the pattern used in sentry/utils/http.py which does urljoin(url_prefix.rstrip("/") + "/", url.lstrip("/")).
Severity: MEDIUM

💡 Suggested Fix

Suggested change
target_url = urljoin(target_base_url + "/", path.lstrip("/"))

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 3799736

Comment on lines +125 to +135
bytes_read += len(chunk)
self._current_chunk_remaining -= len(chunk)

if self._current_chunk_remaining == 0:
self._read(2) # Read trailing \r\n
else:
# Read next chunk size line
size_line = b""
while not size_line.endswith(b"\r\n"):
byte = self._read(1)
if not byte:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ChunkedStreamDecoder.read() method, when reading chunk size lines byte-by-byte, if the connection is dropped mid-chunk-size (e.g., client sends 123 without \r\n), the code will loop indefinitely trying to read the next byte. The method should have a timeout or maximum iteration limit to prevent hanging. Consider adding a maximum size limit for chunk size lines (they should typically be quite small).
Severity: MEDIUM

🤖 Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/objectstore/endpoints/organization.py#L125-L135

Potential issue: In the `ChunkedStreamDecoder.read()` method, when reading chunk size
lines byte-by-byte, if the connection is dropped mid-chunk-size (e.g., client sends
`123` without `\r\n`), the code will loop indefinitely trying to read the next byte. The
method should have a timeout or maximum iteration limit to prevent hanging. Consider
adding a maximum size limit for chunk size lines (they should typically be quite small).

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 3799736

Comment on lines +62 to +70
self,
method: Literal["GET", "PUT", "POST", "DELETE"],
path: str,
request: Request,
) -> Response | StreamingHttpResponse:
target_base_url = options.get("objectstore.config")["base_url"].rstrip("/")
target_url = urljoin(target_base_url, path)

headers = dict(request.headers)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _proxy method doesn't validate or sanitize the path parameter. While the URL pattern (?P<path>.*) captures any string, there's no check to prevent path traversal attacks (e.g., ../../../etc/passwd). The objectstore backend should validate this, but the gateway should also implement defensive checks. Consider validating that the path doesn't contain suspicious patterns like .. or //.
Severity: MEDIUM

🤖 Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/objectstore/endpoints/organization.py#L62-L70

Potential issue: The `_proxy` method doesn't validate or sanitize the `path` parameter.
While the URL pattern `(?P<path>.*)` captures any string, there's no check to prevent
path traversal attacks (e.g., `../../../etc/passwd`). The objectstore backend should
validate this, but the gateway should also implement defensive checks. Consider
validating that the path doesn't contain suspicious patterns like `..` or `//`.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 3799736

Comment on lines +84 to +92
response = requests.request(
method,
url=target_url,
headers=headers,
data=body_stream,
params=dict(request.GET) if request.GET else None,
stream=True,
allow_redirects=False,
)

Check failure

Code scanning / CodeQL

Full server-side request forgery Critical

The full URL of this request depends on a
user-provided value
.
The full URL of this request depends on a
user-provided value
.
The full URL of this request depends on a
user-provided value
.
The full URL of this request depends on a
user-provided value
.

Copilot Autofix

AI 1 day ago

To fix this Full SSRF vulnerability, we must prevent users from being able to set the full target of the outgoing HTTP request. The best pattern is to validate "path" so that only safe, known locations can be accessed. Generally, this means only allowing "path" to designate a relative path within a known base URL, making sure it cannot escape with e.g. leading "//", absolute URLs, etc.

A robust fix:

  1. Restrict "path" to ensure it cannot be an absolute URL, start with "//", or otherwise escape the intended target domain. This can be achieved by checking that "path" is a relative path, does not start with "http:", "https:", "//", or similar.

  2. Additionally, consider whitelisting or regular-expression filtering for allowed "path" formats, blocking traversal characters ("../", "..").

  3. Apply this check at the entrypoint to the _proxy function (or earlier). If "path" fails validation, return a 400 error.

Implementation:

  • Add a validation function (e.g., _is_safe_path) inside OrganizationObjectstoreEndpoint.
  • Use it right before constructing target_url. If validation fails, respond with a 400 error message.
  • This fix is limited to the edited region and does not change existing endpoint logic.

Suggested changeset 1
src/sentry/objectstore/endpoints/organization.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/sentry/objectstore/endpoints/organization.py b/src/sentry/objectstore/endpoints/organization.py
--- a/src/sentry/objectstore/endpoints/organization.py
+++ b/src/sentry/objectstore/endpoints/organization.py
@@ -64,6 +64,8 @@
         path: str,
         request: Request,
     ) -> Response | StreamingHttpResponse:
+        if not self._is_safe_path(path):
+            return Response("Unsafe path argument", status=400)
         target_base_url = options.get("objectstore.config")["base_url"].rstrip("/")
         target_url = urljoin(target_base_url, path)
 
@@ -95,6 +97,24 @@
         return stream_response(response)
 
 
+    def _is_safe_path(self, path: str) -> bool:
+        """
+        Return True if the provided path is safe to join to the base URL.
+        Rejects absolute URLs and schemes, and known SSRF exploit forms.
+        """
+        # Block absolute URLs and scheme
+        unsafe_prefixes = ("http://", "https://", "ftp://", "//")
+        if any(path.startswith(prefix) for prefix in unsafe_prefixes):
+            return False
+        # Block attempts to traverse upwards (optional - for stricter control)
+        if ".." in path or path.startswith("/"):
+            return False
+        # You may also want a stricter regex for allowed characters, e.g.:
+        # import re
+        # if not re.fullmatch(r"[a-zA-Z0-9_\-/\.]+", path):
+        #     return False
+        return True
+
 class ChunkedStreamDecoder:
     """
     Decodes HTTP chunked transfer encoding on-the-fly without buffering.
EOF
@@ -64,6 +64,8 @@
path: str,
request: Request,
) -> Response | StreamingHttpResponse:
if not self._is_safe_path(path):
return Response("Unsafe path argument", status=400)
target_base_url = options.get("objectstore.config")["base_url"].rstrip("/")
target_url = urljoin(target_base_url, path)

@@ -95,6 +97,24 @@
return stream_response(response)


def _is_safe_path(self, path: str) -> bool:
"""
Return True if the provided path is safe to join to the base URL.
Rejects absolute URLs and schemes, and known SSRF exploit forms.
"""
# Block absolute URLs and scheme
unsafe_prefixes = ("http://", "https://", "ftp://", "//")
if any(path.startswith(prefix) for prefix in unsafe_prefixes):
return False
# Block attempts to traverse upwards (optional - for stricter control)
if ".." in path or path.startswith("/"):
return False
# You may also want a stricter regex for allowed characters, e.g.:
# import re
# if not re.fullmatch(r"[a-zA-Z0-9_\-/\.]+", path):
# return False
return True

class ChunkedStreamDecoder:
"""
Decodes HTTP chunked transfer encoding on-the-fly without buffering.
Copilot is powered by AI and may make mistakes. Always verify output.
@codecov
Copy link

codecov bot commented Nov 26, 2025

❌ 3 Tests Failed:

Tests completed Failed Passed Skipped
30043 3 30040 241
View the top 3 failed test(s) by shortest run time
tests.sentry.objectstore.endpoints.test_organization.OrganizationObjectstoreEndpointTest::test_large_payload
Stack Traces | 13.4s run time
#x1B[1m#x1B[.../objectstore/endpoints/test_organization.py#x1B[0m:99: in test_large_payload
    object_key = session.put(data)
#x1B[1m#x1B[31m.venv/lib/python3.13....../site-packages/objectstore_client/client.py#x1B[0m:271: in put
    raise_for_status(response)
#x1B[1m#x1B[31m.venv/lib/python3.13....../site-packages/objectstore_client/client.py#x1B[0m:347: in raise_for_status
    res = str(response.data or response.read())
#x1B[1m#x1B[31mE   BytesWarning: str() on a bytes instance#x1B[0m
tests.sentry.objectstore.endpoints.test_organization.OrganizationObjectstoreEndpointTest::test_full_cycle
Stack Traces | 13.7s run time
#x1B[1m#x1B[.../objectstore/endpoints/test_organization.py#x1B[0m:67: in test_full_cycle
    object_key = session.put(b"test data")
#x1B[1m#x1B[31m.venv/lib/python3.13....../site-packages/objectstore_client/client.py#x1B[0m:271: in put
    raise_for_status(response)
#x1B[1m#x1B[31m.venv/lib/python3.13....../site-packages/objectstore_client/client.py#x1B[0m:347: in raise_for_status
    res = str(response.data or response.read())
#x1B[1m#x1B[31mE   BytesWarning: str() on a bytes instance#x1B[0m
tests.sentry.objectstore.endpoints.test_organization.OrganizationObjectstoreEndpointTest::test_uncompressed
Stack Traces | 14.2s run time
#x1B[1m#x1B[.../objectstore/endpoints/test_organization.py#x1B[0m:88: in test_uncompressed
    object_key = session.put(b"test data", compression="none")
#x1B[1m#x1B[31m.venv/lib/python3.13....../site-packages/objectstore_client/client.py#x1B[0m:271: in put
    raise_for_status(response)
#x1B[1m#x1B[31m.venv/lib/python3.13....../site-packages/objectstore_client/client.py#x1B[0m:347: in raise_for_status
    res = str(response.data or response.read())
#x1B[1m#x1B[31mE   BytesWarning: str() on a bytes instance#x1B[0m

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components Scope: Frontend Automatically applied to PRs that change frontend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants