feat(objectstore): Add proxying logic to endpoint #104045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

lcian wants to merge 3 commits into master from lcian/feat/objectstore-endpoint

+240 −22

Member

lcian commented Nov 26, 2025

No description provided.

lcian added 2 commits

November 26, 2025 15:12

wip

87ba4c6


          improve

github-actions bot added the Scope: Backend label


          🛠️ Sync API Urls to TypeScirpt

8c11c9c

github-actions bot added the Scope: Frontend label

Contributor

github-actions bot commented Nov 26, 2025

🚨 Warning: This pull request contains Frontend and Backend changes!

It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently.

Have questions? Please ask in the #discuss-dev-infra channel.

Member Author

lcian commented Nov 26, 2025

@sentry review

vercel bot deployed to Preview

November 26, 2025 14:23

View deployment

sentry bot reviewed

View reviewed changes

src/sentry/objectstore/endpoints/organization.py

    
                              self._current_chunk_remaining -= len(chunk)

                              if self._current_chunk_remaining == 0:

                                  self._read(2)  # Read trailing \r\n

sentry bot Nov 26, 2025

Bug: ChunkedStreamDecoder ignores the return value of self._read(2) for \r\n consumption, corrupting chunk parsing on incomplete reads.
_{Severity: HIGH | Confidence: High}

🔍 Detailed Analysis

The ChunkedStreamDecoder calls self._read(2) to consume the trailing \r\n after each chunk. However, the WSGI wsgi.input stream's read(size) method does not guarantee returning exactly size bytes. If _read(2) returns fewer than 2 bytes, the decoder ignores this, and its state is not advanced correctly. This causes subsequent chunk size parsing to become corrupted, leading to a ValueError on line 142 and stream decoding failures.

💡 Suggested Fix

The _read(2) call should verify that exactly 2 bytes were read. If fewer than 2 bytes are returned, the decoder should handle the incomplete read, possibly by retrying or raising an error, to ensure its state is correctly advanced.

🤖 Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/objectstore/endpoints/organization.py#L129

Potential issue: The `ChunkedStreamDecoder` calls `self._read(2)` to consume the
trailing `\r\n` after each chunk. However, the WSGI `wsgi.input` stream's `read(size)`
method does not guarantee returning exactly `size` bytes. If `_read(2)` returns fewer
than 2 bytes, the decoder ignores this, and its state is not advanced correctly. This
causes subsequent chunk size parsing to become corrupted, leading to a `ValueError` on
line 142 and stream decoding failures.

_{Did we get this right? 👍 / 👎 to inform future reviews.}
_{Reference ID: 3799736}

src/sentry/objectstore/endpoints/organization.py

    
                  ) -> Response | StreamingHttpResponse:

                      target_base_url = options.get("objectstore.config")["base_url"].rstrip("/")

                      target_url = urljoin(target_base_url, path)

sentry bot Nov 26, 2025

The urljoin function may not behave as expected when joining paths. According to Python's documentation, urljoin('http://localhost:8888/', 'health') will correctly return http://localhost:8888/health, but urljoin('http://localhost:8888/', '/health') would replace the path entirely. If a path starts with '/', it will be treated as an absolute path. Consider using a more robust path joining approach that handles both leading and non-leading slash cases, similar to the pattern used in sentry/utils/http.py which does urljoin(url_prefix.rstrip("/") + "/", url.lstrip("/")).
_{Severity: MEDIUM}

💡 Suggested Fix

Suggested change

      
            target_url = urljoin(target_base_url + "/", path.lstrip("/"))

_{Did we get this right? 👍 / 👎 to inform future reviews.}
_{Reference ID: 3799736}

src/sentry/objectstore/endpoints/organization.py Show resolved Hide resolved

src/sentry/objectstore/endpoints/organization.py Show resolved Hide resolved

src/sentry/objectstore/endpoints/organization.py Show resolved Hide resolved

tests/sentry/objectstore/endpoints/test_organization.py Show resolved Hide resolved

src/sentry/objectstore/endpoints/organization.py

Comment on lines +125 to +135

    
                              bytes_read += len(chunk)

                              self._current_chunk_remaining -= len(chunk)

                              if self._current_chunk_remaining == 0:

                                  self._read(2)  # Read trailing \r\n

                          else:

                              # Read next chunk size line

                              size_line = b""

                              while not size_line.endswith(b"\r\n"):

                                  byte = self._read(1)

                                  if not byte:

sentry bot Nov 26, 2025

In the ChunkedStreamDecoder.read() method, when reading chunk size lines byte-by-byte, if the connection is dropped mid-chunk-size (e.g., client sends 123 without \r\n), the code will loop indefinitely trying to read the next byte. The method should have a timeout or maximum iteration limit to prevent hanging. Consider adding a maximum size limit for chunk size lines (they should typically be quite small).
_{Severity: MEDIUM}

🤖 Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/objectstore/endpoints/organization.py#L125-L135

Potential issue: In the `ChunkedStreamDecoder.read()` method, when reading chunk size
lines byte-by-byte, if the connection is dropped mid-chunk-size (e.g., client sends
`123` without `\r\n`), the code will loop indefinitely trying to read the next byte. The
method should have a timeout or maximum iteration limit to prevent hanging. Consider
adding a maximum size limit for chunk size lines (they should typically be quite small).

_{Did we get this right? 👍 / 👎 to inform future reviews.}
_{Reference ID: 3799736}

src/sentry/objectstore/endpoints/organization.py

Comment on lines +62 to +70

    
                      self,

                      method: Literal["GET", "PUT", "POST", "DELETE"],

                      path: str,

                      request: Request,

                  ) -> Response | StreamingHttpResponse:

                      target_base_url = options.get("objectstore.config")["base_url"].rstrip("/")

                      target_url = urljoin(target_base_url, path)

                      headers = dict(request.headers)

sentry bot Nov 26, 2025

The _proxy method doesn't validate or sanitize the path parameter. While the URL pattern (?P<path>.*) captures any string, there's no check to prevent path traversal attacks (e.g., ../../../etc/passwd). The objectstore backend should validate this, but the gateway should also implement defensive checks. Consider validating that the path doesn't contain suspicious patterns like .. or //.
_{Severity: MEDIUM}

🤖 Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/objectstore/endpoints/organization.py#L62-L70

Potential issue: The `_proxy` method doesn't validate or sanitize the `path` parameter.
While the URL pattern `(?P<path>.*)` captures any string, there's no check to prevent
path traversal attacks (e.g., `../../../etc/passwd`). The objectstore backend should
validate this, but the gateway should also implement defensive checks. Consider
validating that the path doesn't contain suspicious patterns like `..` or `//`.

_{Did we get this right? 👍 / 👎 to inform future reviews.}
_{Reference ID: 3799736}

github-advanced-security bot found potential problems

View reviewed changes

src/sentry/objectstore/endpoints/organization.py

Comment on lines +84 to +92

    
                      response = requests.request(

                          method,

                          url=target_url,

                          headers=headers,

                          data=body_stream,

                          params=dict(request.GET) if request.GET else None,

                          stream=True,

                          allow_redirects=False,

                      )

Check failure

Code scanning / CodeQL

Full server-side request forgery Critical

The full URL of this request depends on a

user-provided value

.
The full URL of this request depends on a

user-provided value

.
The full URL of this request depends on a

user-provided value

.
The full URL of this request depends on a

user-provided value

.

Copilot Autofix

AI 1 day ago

To fix this Full SSRF vulnerability, we must prevent users from being able to set the full target of the outgoing HTTP request. The best pattern is to validate "path" so that only safe, known locations can be accessed. Generally, this means only allowing "path" to designate a relative path within a known base URL, making sure it cannot escape with e.g. leading "//", absolute URLs, etc.

A robust fix:

Restrict "path" to ensure it cannot be an absolute URL, start with "//", or otherwise escape the intended target domain. This can be achieved by checking that "path" is a relative path, does not start with "http:", "https:", "//", or similar.
Additionally, consider whitelisting or regular-expression filtering for allowed "path" formats, blocking traversal characters ("../", "..").
Apply this check at the entrypoint to the _proxy function (or earlier). If "path" fails validation, return a 400 error.

Implementation:

Add a validation function (e.g., _is_safe_path) inside OrganizationObjectstoreEndpoint.
Use it right before constructing target_url. If validation fails, respond with a 400 error message.
This fix is limited to the edited region and does not change existing endpoint logic.

Suggested changeset 1

src/sentry/objectstore/endpoints/organization.py

@@ -64,6 +64,8 @@
                     path: str,
                     request: Request,
                 ) -> Response | StreamingHttpResponse:
+                    if not self._is_safe_path(path):
+                        return Response("Unsafe path argument", status=400)
                     target_base_url = options.get("objectstore.config")["base_url"].rstrip("/")
                     target_url = urljoin(target_base_url, path)
@@ -95,6 +97,24 @@
                     return stream_response(response)
+                def _is_safe_path(self, path: str) -> bool:
+                    """
+                    Return True if the provided path is safe to join to the base URL.
+                    Rejects absolute URLs and schemes, and known SSRF exploit forms.
+                    """
+                    # Block absolute URLs and scheme
+                    unsafe_prefixes = ("http://", "https://", "ftp://", "//")
+                    if any(path.startswith(prefix) for prefix in unsafe_prefixes):
+                        return False
+                    # Block attempts to traverse upwards (optional - for stricter control)
+                    if ".." in path or path.startswith("/"):
+                        return False
+                    # You may also want a stricter regex for allowed characters, e.g.:
+                    # import re
+                    # if not re.fullmatch(r"[a-zA-Z0-9_\-/\.]+", path):
+                    #     return False
+                    return True
             class ChunkedStreamDecoder:
                 """
                 Decodes HTTP chunked transfer encoding on-the-fly without buffering.

Copilot is powered by AI and may make mistakes. Always verify output.

codecov bot commented Nov 26, 2025 •

edited

Loading

❌ 3 Tests Failed:

Tests completed	Failed	Passed	Skipped
30043	3	30040	241

View the top 3 failed test(s) by shortest run time

tests.sentry.objectstore.endpoints.test_organization.OrganizationObjectstoreEndpointTest::test_large_payload

Stack Traces | 13.4s run time

#x1B[1m#x1B[.../objectstore/endpoints/test_organization.py#x1B[0m:99: in test_large_payload
    object_key = session.put(data)
#x1B[1m#x1B[31m.venv/lib/python3.13....../site-packages/objectstore_client/client.py#x1B[0m:271: in put
    raise_for_status(response)
#x1B[1m#x1B[31m.venv/lib/python3.13....../site-packages/objectstore_client/client.py#x1B[0m:347: in raise_for_status
    res = str(response.data or response.read())
#x1B[1m#x1B[31mE   BytesWarning: str() on a bytes instance#x1B[0m

tests.sentry.objectstore.endpoints.test_organization.OrganizationObjectstoreEndpointTest::test_full_cycle

Stack Traces | 13.7s run time

#x1B[1m#x1B[.../objectstore/endpoints/test_organization.py#x1B[0m:67: in test_full_cycle
    object_key = session.put(b"test data")
#x1B[1m#x1B[31m.venv/lib/python3.13....../site-packages/objectstore_client/client.py#x1B[0m:271: in put
    raise_for_status(response)
#x1B[1m#x1B[31m.venv/lib/python3.13....../site-packages/objectstore_client/client.py#x1B[0m:347: in raise_for_status
    res = str(response.data or response.read())
#x1B[1m#x1B[31mE   BytesWarning: str() on a bytes instance#x1B[0m

tests.sentry.objectstore.endpoints.test_organization.OrganizationObjectstoreEndpointTest::test_uncompressed

Stack Traces | 14.2s run time

#x1B[1m#x1B[.../objectstore/endpoints/test_organization.py#x1B[0m:88: in test_uncompressed
    object_key = session.put(b"test data", compression="none")
#x1B[1m#x1B[31m.venv/lib/python3.13....../site-packages/objectstore_client/client.py#x1B[0m:271: in put
    raise_for_status(response)
#x1B[1m#x1B[31m.venv/lib/python3.13....../site-packages/objectstore_client/client.py#x1B[0m:347: in raise_for_status
    res = str(response.data or response.read())
#x1B[1m#x1B[31mE   BytesWarning: str() on a bytes instance#x1B[0m

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Scope: Frontend