-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
feat(objectstore): Add proxying logic to endpoint #104045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
🚨 Warning: This pull request contains Frontend and Backend changes! It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently. Have questions? Please ask in the |
|
@sentry review |
| self._current_chunk_remaining -= len(chunk) | ||
|
|
||
| if self._current_chunk_remaining == 0: | ||
| self._read(2) # Read trailing \r\n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: ChunkedStreamDecoder ignores the return value of self._read(2) for \r\n consumption, corrupting chunk parsing on incomplete reads.
Severity: HIGH | Confidence: High
🔍 Detailed Analysis
The ChunkedStreamDecoder calls self._read(2) to consume the trailing \r\n after each chunk. However, the WSGI wsgi.input stream's read(size) method does not guarantee returning exactly size bytes. If _read(2) returns fewer than 2 bytes, the decoder ignores this, and its state is not advanced correctly. This causes subsequent chunk size parsing to become corrupted, leading to a ValueError on line 142 and stream decoding failures.
💡 Suggested Fix
The _read(2) call should verify that exactly 2 bytes were read. If fewer than 2 bytes are returned, the decoder should handle the incomplete read, possibly by retrying or raising an error, to ensure its state is correctly advanced.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/objectstore/endpoints/organization.py#L129
Potential issue: The `ChunkedStreamDecoder` calls `self._read(2)` to consume the
trailing `\r\n` after each chunk. However, the WSGI `wsgi.input` stream's `read(size)`
method does not guarantee returning exactly `size` bytes. If `_read(2)` returns fewer
than 2 bytes, the decoder ignores this, and its state is not advanced correctly. This
causes subsequent chunk size parsing to become corrupted, leading to a `ValueError` on
line 142 and stream decoding failures.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 3799736
| ) -> Response | StreamingHttpResponse: | ||
| target_base_url = options.get("objectstore.config")["base_url"].rstrip("/") | ||
| target_url = urljoin(target_base_url, path) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The urljoin function may not behave as expected when joining paths. According to Python's documentation, urljoin('http://localhost:8888/', 'health') will correctly return http://localhost:8888/health, but urljoin('http://localhost:8888/', '/health') would replace the path entirely. If a path starts with '/', it will be treated as an absolute path. Consider using a more robust path joining approach that handles both leading and non-leading slash cases, similar to the pattern used in sentry/utils/http.py which does urljoin(url_prefix.rstrip("/") + "/", url.lstrip("/")).
Severity: MEDIUM
💡 Suggested Fix
| target_url = urljoin(target_base_url + "/", path.lstrip("/")) |
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 3799736
| bytes_read += len(chunk) | ||
| self._current_chunk_remaining -= len(chunk) | ||
|
|
||
| if self._current_chunk_remaining == 0: | ||
| self._read(2) # Read trailing \r\n | ||
| else: | ||
| # Read next chunk size line | ||
| size_line = b"" | ||
| while not size_line.endswith(b"\r\n"): | ||
| byte = self._read(1) | ||
| if not byte: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ChunkedStreamDecoder.read() method, when reading chunk size lines byte-by-byte, if the connection is dropped mid-chunk-size (e.g., client sends 123 without \r\n), the code will loop indefinitely trying to read the next byte. The method should have a timeout or maximum iteration limit to prevent hanging. Consider adding a maximum size limit for chunk size lines (they should typically be quite small).
Severity: MEDIUM
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/objectstore/endpoints/organization.py#L125-L135
Potential issue: In the `ChunkedStreamDecoder.read()` method, when reading chunk size
lines byte-by-byte, if the connection is dropped mid-chunk-size (e.g., client sends
`123` without `\r\n`), the code will loop indefinitely trying to read the next byte. The
method should have a timeout or maximum iteration limit to prevent hanging. Consider
adding a maximum size limit for chunk size lines (they should typically be quite small).
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 3799736
| self, | ||
| method: Literal["GET", "PUT", "POST", "DELETE"], | ||
| path: str, | ||
| request: Request, | ||
| ) -> Response | StreamingHttpResponse: | ||
| target_base_url = options.get("objectstore.config")["base_url"].rstrip("/") | ||
| target_url = urljoin(target_base_url, path) | ||
|
|
||
| headers = dict(request.headers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _proxy method doesn't validate or sanitize the path parameter. While the URL pattern (?P<path>.*) captures any string, there's no check to prevent path traversal attacks (e.g., ../../../etc/passwd). The objectstore backend should validate this, but the gateway should also implement defensive checks. Consider validating that the path doesn't contain suspicious patterns like .. or //.
Severity: MEDIUM
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/objectstore/endpoints/organization.py#L62-L70
Potential issue: The `_proxy` method doesn't validate or sanitize the `path` parameter.
While the URL pattern `(?P<path>.*)` captures any string, there's no check to prevent
path traversal attacks (e.g., `../../../etc/passwd`). The objectstore backend should
validate this, but the gateway should also implement defensive checks. Consider
validating that the path doesn't contain suspicious patterns like `..` or `//`.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 3799736
| response = requests.request( | ||
| method, | ||
| url=target_url, | ||
| headers=headers, | ||
| data=body_stream, | ||
| params=dict(request.GET) if request.GET else None, | ||
| stream=True, | ||
| allow_redirects=False, | ||
| ) |
Check failure
Code scanning / CodeQL
Full server-side request forgery Critical
user-provided value
The full URL of this request depends on a
user-provided value
The full URL of this request depends on a
user-provided value
The full URL of this request depends on a
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
To fix this Full SSRF vulnerability, we must prevent users from being able to set the full target of the outgoing HTTP request. The best pattern is to validate "path" so that only safe, known locations can be accessed. Generally, this means only allowing "path" to designate a relative path within a known base URL, making sure it cannot escape with e.g. leading "//", absolute URLs, etc.
A robust fix:
-
Restrict "path" to ensure it cannot be an absolute URL, start with "//", or otherwise escape the intended target domain. This can be achieved by checking that "path" is a relative path, does not start with "http:", "https:", "//", or similar.
-
Additionally, consider whitelisting or regular-expression filtering for allowed "path" formats, blocking traversal characters ("../", "..").
-
Apply this check at the entrypoint to the
_proxyfunction (or earlier). If "path" fails validation, return a 400 error.
Implementation:
- Add a validation function (e.g.,
_is_safe_path) inside OrganizationObjectstoreEndpoint. - Use it right before constructing
target_url. If validation fails, respond with a 400 error message. - This fix is limited to the edited region and does not change existing endpoint logic.
-
Copy modified lines R67-R68 -
Copy modified lines R100-R117
| @@ -64,6 +64,8 @@ | ||
| path: str, | ||
| request: Request, | ||
| ) -> Response | StreamingHttpResponse: | ||
| if not self._is_safe_path(path): | ||
| return Response("Unsafe path argument", status=400) | ||
| target_base_url = options.get("objectstore.config")["base_url"].rstrip("/") | ||
| target_url = urljoin(target_base_url, path) | ||
|
|
||
| @@ -95,6 +97,24 @@ | ||
| return stream_response(response) | ||
|
|
||
|
|
||
| def _is_safe_path(self, path: str) -> bool: | ||
| """ | ||
| Return True if the provided path is safe to join to the base URL. | ||
| Rejects absolute URLs and schemes, and known SSRF exploit forms. | ||
| """ | ||
| # Block absolute URLs and scheme | ||
| unsafe_prefixes = ("http://", "https://", "ftp://", "//") | ||
| if any(path.startswith(prefix) for prefix in unsafe_prefixes): | ||
| return False | ||
| # Block attempts to traverse upwards (optional - for stricter control) | ||
| if ".." in path or path.startswith("/"): | ||
| return False | ||
| # You may also want a stricter regex for allowed characters, e.g.: | ||
| # import re | ||
| # if not re.fullmatch(r"[a-zA-Z0-9_\-/\.]+", path): | ||
| # return False | ||
| return True | ||
|
|
||
| class ChunkedStreamDecoder: | ||
| """ | ||
| Decodes HTTP chunked transfer encoding on-the-fly without buffering. |
❌ 3 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
No description provided.