feat: Enhance GCS connector docs for distcp with HNS#1374
Open
cjac wants to merge 1 commit intoGoogleCloudDataproc:masterfrom
Open
feat: Enhance GCS connector docs for distcp with HNS#1374cjac wants to merge 1 commit intoGoogleCloudDataproc:masterfrom
cjac wants to merge 1 commit intoGoogleCloudDataproc:masterfrom
Conversation
This PR improves the Cloud Storage connector documentation to better
support users performing `distcp` operations with Hierarchical Namespace (HNS)
enabled buckets in self-managed Hadoop environments.
This change directly addresses customer issues observed in Salesforce case
[500Kf00000Y8MmwIAF] and Buganizer report [389061732], where users
experienced intermittent `distcp` failures, often manifesting as `DEADLINE_EXCEEDED`
errors or generic `SSH operator error: exit status = 25`.
Key changes include:
- **`gcs/CONFIGURATION.md`**:
- Clarified guidance on `fs.gs.http.read-timeout` and `fs.gs.hierarchical.namespace.folders.enable`
to address `DEADLINE_EXCEEDED` errors and ensure proper HNS interaction.
- Added troubleshooting tips for generic exit codes and recommendations for
using shaded JARs to resolve dependency conflicts.
- **`gcs/INSTALL.md`**:
- Expanded the "Troubleshooting the installation" section with more detailed
advice on diagnosing dependency conflicts and enabling verbose logging,
specifically highlighting its utility for `DEADLINE_EXCEEDED` errors.
- **`gcs/README.md`**:
- Updated the "Configuring the connector" section to prominently guide users
facing `distcp` and HNS issues, including `DEADLINE_EXCEEDED` errors, to the
more detailed `CONFIGURATION.md`.
These updates aim to provide clearer instructions and troubleshooting steps,
reducing the need for support engagement for these common problems in
non-Dataproc Hadoop deployments.
Self link: go/ghgcd/hadoop-connectors/pull/1374
Related CL: cl/767194879
Addresses support issue:
go/sf/55915396 (case)
go/sf/56459963 (consult)
Addresses bug: b/389061732
Author
|
Solves issue #1375 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR improves the Cloud Storage connector documentation to better support users performing
distcpoperations with Hierarchical Namespace (HNS) enabled buckets in self-managed Hadoop environments.This change directly addresses customer issues observed in Salesforce case [56459963] and Buganizer report [389061732], where users experienced intermittent
distcpfailures, often manifesting asDEADLINE_EXCEEDEDerrors or genericSSH operator error: exit status = 25.Key changes include:
gcs/CONFIGURATION.md:fs.gs.http.read-timeoutandfs.gs.hierarchical.namespace.folders.enableto addressDEADLINE_EXCEEDEDerrors and ensure proper HNS interaction.gcs/INSTALL.md:DEADLINE_EXCEEDEDerrors.gcs/README.md:distcpand HNS issues, includingDEADLINE_EXCEEDEDerrors, to the more detailedCONFIGURATION.md.These updates aim to provide clearer instructions and troubleshooting steps, reducing the need for support engagement for these common problems in non-Dataproc Hadoop deployments.
Self link: go/ghgcd/hadoop-connectors/pull/1374
Related CL: cl/767194879
Addresses support issue:
go/sf/55915396 (case)
go/sf/56459963 (consult)
Addresses GitHub issue#1375
Addresses bug: b/389061732