-
Notifications
You must be signed in to change notification settings - Fork 4.5k
GCS client library migration in Java SDK - part 1 #36876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… of gcsio in GcsUtil
| import com.google.api.services.storage.model.StorageObject; | ||
| import com.google.auth.Credentials; | ||
| import com.google.auto.value.AutoValue; | ||
| import com.google.cloud.hadoop.gcsio.CreateObjectOptions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
References to com.google.cloud.hadoop.gcsio.* and com.google.cloud.hadoop.util.* have been removed from GcsUtil.java. They are kept in GcsUtilLegacy.java though during the migration process.
| } | ||
|
|
||
| @VisibleForTesting | ||
| void setCloudStorageImpl(GoogleCloudStorage g) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the two setCloudStorageImpl functions are no longer surfaced for testing in GcsUtil.java, we still keep them in GcsUtilLegacy.java.
| } | ||
| } | ||
|
|
||
| GoogleCloudStorage createGoogleCloudStorage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is also removed from GcsUtil.java, but will remain in GcsUtilLegacy.java.
| * @return a SeekableByteChannel that can read the object data | ||
| */ | ||
| public SeekableByteChannel open(GcsPath path) throws IOException { | ||
| return open(path, this.googleCloudStorageOptions.getReadChannelOptions()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing with the previous implementation
Lines 593 to 600 in b20ccbf
| public SeekableByteChannel open(GcsPath path) throws IOException { | |
| String bucket = path.getBucket(); | |
| SeekableByteChannel channel = | |
| googleCloudStorage.open( | |
| new StorageResourceId(path.getBucket(), path.getObject()), | |
| this.googleCloudStorageOptions.getReadChannelOptions()); | |
| return wrapInCounting(channel, bucket); | |
| } |
we will invoke the open() function below to include the IO request counters similar to the create() functions below.
|
cc'ed @clairemcginty Context: gcsio is sunsetting (https://s.apache.org/beam-gcsutil-modernization), and we are on the way to migrate to the GCS client library for Beam Java SDK. |
| import org.apache.beam.runners.core.metrics.MonitoringInfoMetricName; | ||
| import org.apache.beam.sdk.extensions.gcp.auth.TestCredential; | ||
| import org.apache.beam.sdk.extensions.gcp.options.GcsOptions; | ||
| import org.apache.beam.sdk.extensions.gcp.util.GcsUtil.BatchInterface; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BatchInterface and RewriteOp are only used in Tests, so it is fine to get rid of them in GcsUtil and keep them in GcsUtilLegacy.
|
Assigning reviewers: R: @Abacn for label java. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
Reminder, please take a look at this pr: @Abacn |
|
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @chamikaramj for label java. Available commands:
|
| @Description( | ||
| "The GoogleCloudStorageReadOptions instance that should be used to read from Google Cloud Storage.") | ||
| @Default.InstanceFactory(GcsUtil.GcsReadOptionsFactory.class) | ||
| @Default.InstanceFactory(GcsUtilLegacy.GcsReadOptionsFactory.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "GcsUtilV1" (and latest version of the class still as GcsUtil)? this follows other branched classes naming convention, and in case they both exist in Beam code base for extended time and a "GcsUtilV3" appear in the future (hopefuly won't be the case).
| @SuppressWarnings({ | ||
| "nullness" // TODO(https://github.com/apache/beam/issues/20497) | ||
| }) | ||
| public class GcsUtilLegacy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we make it package private?
| storageBuilder.getHttpRequestInitializer(), | ||
| gcsOptions.getExecutorService(), | ||
| hasExperiment(options, "use_grpc_for_gcs"), | ||
| org.apache.beam.sdk.options.ExperimentalOptions.hasExperiment( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: import hasExperiment
|
|
||
| LinkedList<RewriteOp> makeRewriteOps( | ||
| @VisibleForTesting | ||
| @SuppressWarnings("JdkObsolete") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add comments for "SuppressWarnings". Here it is // for LinkedList
Also this appears only in higher version of Java. That's likely why it is not captured before
|
Reminder, please take a look at this pr: @chamikaramj |
|
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @ahmedabu98 for label java. Available commands:
|
|
waiting on author |
This PR lays the groundwork for the GCS client library migration in the Java SDK. It refactors the existing GCS client implementation to prepare for a new implementation while maintaining backward compatibility.
Key changes include:
GcsUtil.java.GcsUtilLegacy.javaandGcsOptions.javawill be removed in a future PR.openAPI in the legacy module has been rerouted to include previously implemented but unused IO request count metrics, as discussed in [BEAM-11980] Java GCS - Implement IO Request Count metrics #15394 (comment).