Skip to content

Conversation

@BenWhitehead
Copy link

Define a ReadableByteChannel that can validate limits relative to the delegate channel provided to it.

If an over-read happens a warning message will be logged. If an under-read happens an IOException will be thrown.

Add unit tests to validate behavior using copying and direct read operations.

Define a ReadableByteChannel that can validate limits relative to the delegate channel provided to it.

If an over-read happens a warning message will be logged. If an under-read happens an IOException will be thrown.

Add unit tests to validate behavior using copying and direct read operations.
@gemini-code-assist
Copy link

Summary of Changes

Hello @BenWhitehead, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness of byte channel operations by introducing a RangeValidatingReadableByteChannel. This new component ensures that data reads adhere strictly to defined byte ranges, providing explicit error handling for under-reads and logging warnings for over-reads. This change is integrated into the GCS client's read channel, improving data integrity and predictability during object access.

Highlights

  • New RangeValidatingReadableByteChannel: Introduced a new ReadableByteChannel implementation that validates read operations against specified byte ranges.
  • Read Validation Logic: The new channel logs a warning if an over-read occurs (reading beyond the specified limit) and throws an IOException if an under-read is detected (EOF reached before the specified limit).
  • Integration with GCS Read Channel: The GoogleCloudStorageClientReadChannel has been updated to utilize this new validating channel, ensuring range integrity for GCS object reads.
  • Comprehensive Unit Tests: Added unit tests for the RangeValidatingReadableByteChannel to verify its behavior across happy path, over-read, and under-read scenarios using both copying and direct buffer read operations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a RangeValidatingReadableByteChannel to encapsulate read validation logic, which is a great improvement for modularity. The GoogleCloudStorageClientReadChannel is refactored to use this new component, simplifying its implementation. The new class is well-tested, covering over-read and under-read scenarios. My feedback includes a couple of suggestions to improve code clarity and simplify logic.

Comment on lines +545 to +552
private BlobSourceOption[] generateReadOptions(StorageResourceId blobId) {
List<BlobSourceOption> blobReadOptions = new ArrayList<>();
// enable transparent gzip-decompression
blobReadOptions.add(BlobSourceOption.shouldReturnRawInputStream(false));

if (blobId.getGenerationId() > StorageResourceId.UNKNOWN_GENERATION_ID) {
blobReadOptions.add(BlobSourceOption.generationMatch(blobId.getGenerationId()));
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parameter blobId is of type StorageResourceId, which could be confusing since there is also a com.google.cloud.storage.BlobId class used in this file. To improve clarity and avoid potential confusion, consider renaming it to resourceId to match its type and the variable name used at the call site in the constructor.

Suggested change
private BlobSourceOption[] generateReadOptions(StorageResourceId blobId) {
List<BlobSourceOption> blobReadOptions = new ArrayList<>();
// enable transparent gzip-decompression
blobReadOptions.add(BlobSourceOption.shouldReturnRawInputStream(false));
if (blobId.getGenerationId() > StorageResourceId.UNKNOWN_GENERATION_ID) {
blobReadOptions.add(BlobSourceOption.generationMatch(blobId.getGenerationId()));
}
private BlobSourceOption[] generateReadOptions(StorageResourceId resourceId) {
List<BlobSourceOption> blobReadOptions = new ArrayList<>();
// enable transparent gzip-decompression
blobReadOptions.add(BlobSourceOption.shouldReturnRawInputStream(false));
if (resourceId.getGenerationId() > StorageResourceId.UNKNOWN_GENERATION_ID) {
blobReadOptions.add(BlobSourceOption.generationMatch(resourceId.getGenerationId()));
}

Comment on lines +43 to +47
if (expectedChannelRemaining < dst.remaining()) {
expectedMaxRead = Math.toIntExact(expectedChannelRemaining);
} else {
expectedMaxRead = dst.remaining();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The calculation for expectedMaxRead can be simplified and made more robust by ensuring it's never negative. Using Math.max(0, ...) will handle cases where position > endOffset cleanly, and Math.min can combine the logic into a single line.

    expectedMaxRead = (int) Math.min(dst.remaining(), Math.max(0, expectedChannelRemaining));

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dst.remaining will always return a value >= 0. This is already taken care of gemini. Also, blindly casing a long to an int can cause overflow, which is why Math.toIntExact is used instead.

position += read;
if (read > expectedMaxRead) {
// over-read
logger.atWarning().log(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should close theRangeValidatingReadableByteChannel in GoogleCloudStorageClientReadChannel.java‎ when overshoot happens to avoid query failures.

int read = delegate.read(dst);
if (read > -1) {
position += read;
if (read > expectedMaxRead) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the logs received in the bug, buffer had some remaining space but channel end was crossed. So expectedMaxRead will always be greater than or equal to read and unlikely to log what is happening in the bug. I think we should log over shoot when position > endOffset to find out what is happening in the actual bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants