Skip to content

Conversation

@Dhriti07
Copy link
Collaborator

No description provided.

@Dhriti07
Copy link
Collaborator Author

/gcbrun

@gemini-code-assist
Copy link

Summary of Changes

Hello @Dhriti07, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request backports a set of Bidi (Bi-directional) changes from the master branch to the 4.0.x branch. The primary goal is to introduce a new API for lexicographical listing of files and objects, alongside an upgrade of the underlying Hadoop version. Additionally, it refines the handling of vectored read operations by adding new metrics and improving resource management within the Bidi read channel, ensuring better performance and stability.

Highlights

  • Hadoop Version Upgrade: The Hadoop version has been upgraded from 3.3.6 to 3.4.2 across relevant dependencies.
  • New listStatusStartingFrom API: A new experimental API, listStatusStartingFrom, has been introduced in GoogleHadoopFileSystem and GoogleCloudStorageFileSystem. This API allows listing file statuses and object information lexicographically from a specified offset, filtering out directory objects.
  • Vectored Read Metrics: New metrics (STREAM_READ_VECTORED_COUNT and STREAM_READ_VECTORED_RANGE_COUNT) have been added to GhfsThreadLocalStatistics to track vectored read operations, enhancing observability.
  • Improved GoogleCloudStorageBidiReadChannel Closing: The close() method in GoogleCloudStorageBidiReadChannel has been enhanced to robustly handle the closing of BlobReadSessions, including those that are still pending, and to address a potential memory leak by copying ByteString to a new ByteBuffer.
  • Dependency Updates and Exclusions: The dependency-reduced-pom.xml has been updated with version changes (e.g., 3.1.4-SNAPSHOT to 3.0.0-SNAPSHOT) and various dependency exclusions (e.g., jettison, netty-handler, bcprov-jdk18on) and updates (e.g., commons-collections to commons-collections4).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request backports Bidirectional gRPC support and a new listStatusStartingFrom API from the master branch. The changes include dependency updates, implementation of the new API across different layers, and the addition of corresponding tests. The implementation looks solid, and the tests are comprehensive. I've included a few minor suggestions to improve documentation and release notes for clarity.

Comment on lines 5 to 12
1. Upgrade hadoop versionto `3.4.2`
2. Add bidi-support in connector

### 4.0.x
1. Add listStatusStartingFrom API.

1. Add AUTO_RANDOM as new fadvise mode.

1. Add getFileStatusWithHint() API

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The release notes have a minor typo and the list formatting could be improved for readability. I'd suggest using sequential numbering for the list items and fixing the typo versionto.

Suggested change
1. Upgrade hadoop versionto `3.4.2`
2. Add bidi-support in connector
### 4.0.x
1. Add listStatusStartingFrom API.
1. Add AUTO_RANDOM as new fadvise mode.
1. Add getFileStatusWithHint() API
1. Upgrade hadoop version to `3.4.2`
2. Add bidi-support in connector
3. Add listStatusStartingFrom API.
4. Add AUTO_RANDOM as new fadvise mode.
5. Add getFileStatusWithHint() API

* Gets FileStatus of all files which are lexicographically greater than and equal the provided
* path. It filters out any directory objects present in underneath storage.
*
* <p>This is an experimental API can change without notice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a small typo in the Javadoc. It should be ... API and can change ....

Suggested change
* <p>This is an experimental API can change without notice.
* <p>This is an experimental API and can change without notice.

Comment on lines +383 to +393
* <p>Note: As GCS doesn't implement a file system, directory is also treated as an object (if
* it's been created). This APi filters out all those directory object and maintain the order of
* items. This APi strictly expects delimiter in listOptions to be not set.
*
* <p>Consider a bucket with objects: {@code o1}, {@code d1/}, {@code d1/o1}, {@code d1/o2}
*
* <ul>
* <li/>With {@code startOffset == "o1"} , we get: {@code o1}
* <li/>With {@code startOffset == "d1/"} , we get: {@code d1/o1} {@code d1/o2}
* <li/>With {@code startOffset == "d1/"o1} , we get: {@code d1/o1} {@code d1/o2}
* <p>This is an experimental API and can change without notice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Javadoc for this new method has a few typos and formatting issues that could be cleaned up for better readability. Specifically:

  • APi should be API.
  • maintain should be maintains.
  • <li/> should be <li>.
  • There's a typo in an example: "d1/"o1 should be "d1/o1".
  • The indentation of the final <p> tag is off.
Suggested change
* <p>Note: As GCS doesn't implement a file system, directory is also treated as an object (if
* it's been created). This APi filters out all those directory object and maintain the order of
* items. This APi strictly expects delimiter in listOptions to be not set.
*
* <p>Consider a bucket with objects: {@code o1}, {@code d1/}, {@code d1/o1}, {@code d1/o2}
*
* <ul>
* <li/>With {@code startOffset == "o1"} , we get: {@code o1}
* <li/>With {@code startOffset == "d1/"} , we get: {@code d1/o1} {@code d1/o2}
* <li/>With {@code startOffset == "d1/"o1} , we get: {@code d1/o1} {@code d1/o2}
* <p>This is an experimental API and can change without notice.
* <p>Note: As GCS doesn't implement a file system, a directory is also treated as an object (if
* it's been created). This API filters out all those directory objects and maintains the order of
* items. This API strictly expects the delimiter in listOptions to not be set.
*
* <p>Consider a bucket with objects: {@code o1}, {@code d1/}, {@code d1/o1}, {@code d1/o2}
*
* <ul>
* <li>With {@code startOffset == "o1"}, we get: {@code o1}</li>
* <li>With {@code startOffset == "d1/"}, we get: {@code d1/o1}, {@code d1/o2}</li>
* <li>With {@code startOffset == "d1/o1"}, we get: {@code d1/o1}, {@code d1/o2}</li>
* </ul>
* <p>This is an experimental API and can change without notice.

Comment on lines +438 to +448
* <p>Note: As GCS doesn't implement a file system, directory is also treated as an object (if
* it's been created). This APi filters out all those directory object and maintain the order of
* items. This APi strictly expects delimiter in listOptions to be not set.
*
* <p>Consider a bucket with objects: {@code o1}, {@code d1/}, {@code d1/o1}, {@code d1/o2}
*
* <ul>
* <li/>With {@code startOffset == "o1"} , we get: {@code o1}
* <li/>With {@code startOffset == "d1/"} , we get: {@code d1/o1} {@code d1/o2}
* <li/>With {@code startOffset == "d1/"o1} , we get: {@code d1/o1} {@code d1/o2}
* <p>This is an experimental API and can change without notice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Javadoc for this new method has a few typos and formatting issues that could be cleaned up for better readability. Specifically:

  • APi should be API.
  • maintain should be maintains.
  • <li/> should be <li>.
  • There's a typo in an example: "d1/"o1 should be "d1/o1".
  • The indentation of the final <p> tag is off.
Suggested change
* <p>Note: As GCS doesn't implement a file system, directory is also treated as an object (if
* it's been created). This APi filters out all those directory object and maintain the order of
* items. This APi strictly expects delimiter in listOptions to be not set.
*
* <p>Consider a bucket with objects: {@code o1}, {@code d1/}, {@code d1/o1}, {@code d1/o2}
*
* <ul>
* <li/>With {@code startOffset == "o1"} , we get: {@code o1}
* <li/>With {@code startOffset == "d1/"} , we get: {@code d1/o1} {@code d1/o2}
* <li/>With {@code startOffset == "d1/"o1} , we get: {@code d1/o1} {@code d1/o2}
* <p>This is an experimental API and can change without notice.
* <p>Note: As GCS doesn't implement a file system, a directory is also treated as an object (if
* it's been created). This API filters out all those directory objects and maintains the order of
* items. This API strictly expects the delimiter in listOptions to not be set.
*
* <p>Consider a bucket with objects: {@code o1}, {@code d1/}, {@code d1/o1}, {@code d1/o2}
*
* <ul>
* <li>With {@code startOffset == "o1"}, we get: {@code o1}</li>
* <li>With {@code startOffset == "d1/"}, we get: {@code d1/o1}, {@code d1/o2}</li>
* <li>With {@code startOffset == "d1/o1"}, we get: {@code d1/o1}, {@code d1/o2}</li>
* </ul>
* <p>This is an experimental API and can change without notice.

@codecov
Copy link

codecov bot commented Oct 23, 2025

Codecov Report

❌ Patch coverage is 85.50725% with 20 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (branch-4.0.x@90c7298). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...adoop/gcsio/GoogleCloudStorageBidiReadChannel.java 50.00% 5 Missing and 3 partials ⚠️
...gle/cloud/hadoop/gcsio/GoogleCloudStorageImpl.java 90.56% 1 Missing and 4 partials ⚠️
...doop/gcsio/testing/InMemoryGoogleCloudStorage.java 82.14% 2 Missing and 3 partials ⚠️
...le/cloud/hadoop/fs/gcs/GoogleHadoopFileSystem.java 92.30% 0 Missing and 1 partial ⚠️
...hadoop/gcsio/GoogleCloudStorageFileSystemImpl.java 94.11% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@               Coverage Diff               @@
##             branch-4.0.x    #1553   +/-   ##
===============================================
  Coverage                ?   81.85%           
  Complexity              ?     2416           
===============================================
  Files                   ?      126           
  Lines                   ?    10799           
  Branches                ?     1300           
===============================================
  Hits                    ?     8839           
  Misses                  ?     1417           
  Partials                ?      543           
Flag Coverage Δ
integrationtest 66.85% <73.91%> (?)
unittest 72.29% <55.07%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@dheerajsngh dheerajsngh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the link of the Original PRs in the descritpion

@animesh-g
Copy link
Collaborator

Please hold onto this. We are planning to rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants