Skip to content

feat: Enhance GCS connector docs for distcp with HNS#1374

Open
cjac wants to merge 1 commit intoGoogleCloudDataproc:masterfrom
LLC-Technologies-Collier:nhs-20250604
Open

feat: Enhance GCS connector docs for distcp with HNS#1374
cjac wants to merge 1 commit intoGoogleCloudDataproc:masterfrom
LLC-Technologies-Collier:nhs-20250604

Conversation

@cjac
Copy link

@cjac cjac commented Jun 4, 2025

This PR improves the Cloud Storage connector documentation to better support users performing distcp operations with Hierarchical Namespace (HNS) enabled buckets in self-managed Hadoop environments.

This change directly addresses customer issues observed in Salesforce case [56459963] and Buganizer report [389061732], where users experienced intermittent distcp failures, often manifesting as DEADLINE_EXCEEDED errors or generic SSH operator error: exit status = 25.

Key changes include:

  • gcs/CONFIGURATION.md:
    • Clarified guidance on fs.gs.http.read-timeout and fs.gs.hierarchical.namespace.folders.enable to address DEADLINE_EXCEEDED errors and ensure proper HNS interaction.
    • Added troubleshooting tips for generic exit codes and recommendations for using shaded JARs to resolve dependency conflicts.
  • gcs/INSTALL.md:
    • Expanded the "Troubleshooting the installation" section with more detailed advice on diagnosing dependency conflicts and enabling verbose logging, specifically highlighting its utility for DEADLINE_EXCEEDED errors.
  • gcs/README.md:
    • Updated the "Configuring the connector" section to prominently guide users facing distcp and HNS issues, including DEADLINE_EXCEEDED errors, to the more detailed CONFIGURATION.md.

These updates aim to provide clearer instructions and troubleshooting steps, reducing the need for support engagement for these common problems in non-Dataproc Hadoop deployments.

Self link: go/ghgcd/hadoop-connectors/pull/1374
Related CL: cl/767194879

Addresses support issue:
go/sf/55915396 (case)
go/sf/56459963 (consult)

Addresses GitHub issue#1375

Addresses bug: b/389061732

This PR improves the Cloud Storage connector documentation to better
support users performing `distcp` operations with Hierarchical Namespace (HNS)
enabled buckets in self-managed Hadoop environments.

This change directly addresses customer issues observed in Salesforce case
[500Kf00000Y8MmwIAF] and Buganizer report [389061732], where users
experienced intermittent `distcp` failures, often manifesting as `DEADLINE_EXCEEDED`
errors or generic `SSH operator error: exit status = 25`.

Key changes include:
- **`gcs/CONFIGURATION.md`**:
  - Clarified guidance on `fs.gs.http.read-timeout` and `fs.gs.hierarchical.namespace.folders.enable`
    to address `DEADLINE_EXCEEDED` errors and ensure proper HNS interaction.
  - Added troubleshooting tips for generic exit codes and recommendations for
    using shaded JARs to resolve dependency conflicts.
- **`gcs/INSTALL.md`**:
  - Expanded the "Troubleshooting the installation" section with more detailed
    advice on diagnosing dependency conflicts and enabling verbose logging,
    specifically highlighting its utility for `DEADLINE_EXCEEDED` errors.
- **`gcs/README.md`**:
  - Updated the "Configuring the connector" section to prominently guide users
    facing `distcp` and HNS issues, including `DEADLINE_EXCEEDED` errors, to the
    more detailed `CONFIGURATION.md`.

These updates aim to provide clearer instructions and troubleshooting steps,
reducing the need for support engagement for these common problems in
non-Dataproc Hadoop deployments.

Self link: go/ghgcd/hadoop-connectors/pull/1374

Related CL: cl/767194879

Addresses support issue:
  go/sf/55915396 (case)
  go/sf/56459963 (consult)

Addresses bug: b/389061732
@cjac cjac requested a review from medb June 4, 2025 20:10
@cjac
Copy link
Author

cjac commented Jun 4, 2025

Solves issue #1375

@cjac cjac self-assigned this Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant