-
Notifications
You must be signed in to change notification settings - Fork 1.4k
extend disk utilization check to offline segment upload #17579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
extend disk utilization check to offline segment upload #17579
Conversation
| @Api(tags = Constants.SEGMENT_TAG, authorizations = { | ||
| @Authorization(value = SWAGGER_AUTHORIZATION_KEY), | ||
| @Authorization(value = DATABASE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was due to the linter. i followed https://docs.pinot.apache.org/developers/developers-and-contributors/code-setup#intellij
| String.format("Disk utilization limit exceeded for table: %s, rejecting upload for segment: %s", | ||
| tableNameWithType, | ||
| segmentName), | ||
| Response.Status.FORBIDDEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
went with FORBIDDEN since the storage quota checker uses it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR extends disk utilization checks to offline segment upload operations to prevent uploads when disk space limits are exceeded. The implementation adds validation for the /v2/segments and /segments endpoints to reject uploads that would breach configured disk thresholds.
Changes:
- Added
OFFLINE_SEGMENT_UPLOADpurpose to theCheckPurposeenum for tracking disk utilization checks during segment uploads - Integrated
DiskUtilizationCheckerinto the segment upload flow with appropriate error handling and metrics - Updated annotation formatting in
PinotSegmentUploadDownloadRestletResourcefor consistency
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pinot-controller/src/main/java/org/apache/pinot/controller/validation/UtilizationChecker.java | Adds new enum value for offline segment upload check purpose with documentation |
| pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotSegmentUploadDownloadRestletResource.java | Implements disk utilization validation in upload flow, injects checker dependency, and updates annotation formatting |
...ava/org/apache/pinot/controller/api/resources/PinotSegmentUploadDownloadRestletResource.java
Outdated
Show resolved
Hide resolved
| AccessControlFactory _accessControlFactory; | ||
|
|
||
| @Inject | ||
| DiskUtilizationChecker _diskUtilizationChecker; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we plug in ResourceUtilizationManager instead? We want to run all utilization checkers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I described in more detail why I went with this approach in #17557
we'd like a way to enable/disable each resource utilization checker individually, instead of being automatically opted in when more are added in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can decide which checker to include when initializing the controller, but all the checkers should be applied. This is the only way to make resource checkers pluggable. In OSS code, it might not have access to the custom checkers
Co-authored-by: Copilot <[email protected]>
This implements #17557 to extend the disk utilization checks to cover offline segment upload APIs. This PR only covers the single segment upload endpoints,
/v2/segmentsand/segments, and not/segments/batchUpload.I'm directly calling the
DiskUtilizationCheckerclass for now due to the concern with the addition of future resource utilization checkers mentioned in #17557. But I'm open to discussion on changing the approach to something different (e.g. adding a new config as I suggested in the issue) so long as it addresses my concern.Testing
I tested this PR by deploying it to a Pinot cluster and manually uploading segments to the cluster via the
/v2/segmentsendpoint. Once i breached the disk threshold, I saw the error