Skip to content

Conversation

@mluvin-stripe
Copy link

@mluvin-stripe mluvin-stripe commented Jan 26, 2026

This implements #17557 to extend the disk utilization checks to cover offline segment upload APIs. This PR only covers the single segment upload endpoints, /v2/segments and /segments, and not /segments/batchUpload.

I'm directly calling the DiskUtilizationChecker class for now due to the concern with the addition of future resource utilization checkers mentioned in #17557. But I'm open to discussion on changing the approach to something different (e.g. adding a new config as I suggested in the issue) so long as it addresses my concern.

Testing

I tested this PR by deploying it to a Pinot cluster and manually uploading segments to the cluster via the /v2/segments endpoint. Once i breached the disk threshold, I saw the error

{"code":403,"error":"Disk utilization limit exceeded for table: query_metadata_2_OFFLINE, rejecting upload for segment: mluvintest_query_metadata_2__524300__0__20251210T2126Z"}

Comment on lines +131 to +133
@Api(tags = Constants.SEGMENT_TAG, authorizations = {
@Authorization(value = SWAGGER_AUTHORIZATION_KEY),
@Authorization(value = DATABASE)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String.format("Disk utilization limit exceeded for table: %s, rejecting upload for segment: %s",
tableNameWithType,
segmentName),
Response.Status.FORBIDDEN);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

went with FORBIDDEN since the storage quota checker uses it

@mluvin-stripe mluvin-stripe marked this pull request as ready for review January 27, 2026 00:48
@mluvin-stripe
Copy link
Author

cc @Jackie-Jiang

@Jackie-Jiang Jackie-Jiang requested a review from Copilot January 29, 2026 02:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends disk utilization checks to offline segment upload operations to prevent uploads when disk space limits are exceeded. The implementation adds validation for the /v2/segments and /segments endpoints to reject uploads that would breach configured disk thresholds.

Changes:

  • Added OFFLINE_SEGMENT_UPLOAD purpose to the CheckPurpose enum for tracking disk utilization checks during segment uploads
  • Integrated DiskUtilizationChecker into the segment upload flow with appropriate error handling and metrics
  • Updated annotation formatting in PinotSegmentUploadDownloadRestletResource for consistency

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
pinot-controller/src/main/java/org/apache/pinot/controller/validation/UtilizationChecker.java Adds new enum value for offline segment upload check purpose with documentation
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotSegmentUploadDownloadRestletResource.java Implements disk utilization validation in upload flow, injects checker dependency, and updates annotation formatting

AccessControlFactory _accessControlFactory;

@Inject
DiskUtilizationChecker _diskUtilizationChecker;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we plug in ResourceUtilizationManager instead? We want to run all utilization checkers

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I described in more detail why I went with this approach in #17557

we'd like a way to enable/disable each resource utilization checker individually, instead of being automatically opted in when more are added in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can decide which checker to include when initializing the controller, but all the checkers should be applied. This is the only way to make resource checkers pluggable. In OSS code, it might not have access to the custom checkers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants