Skip to content

fix(pure): Make Pure FlashArray HTTP client timeout configurable#5551

Merged
rgolangh merged 2 commits intokubev2v:mainfrom
MikeAnders08:main
Mar 24, 2026
Merged

fix(pure): Make Pure FlashArray HTTP client timeout configurable#5551
rgolangh merged 2 commits intokubev2v:mainfrom
MikeAnders08:main

Conversation

@MikeAnders08
Copy link
Copy Markdown
Contributor

Make Pure FlashArray HTTP client timeout configurable

Problem:

During migrations of VMs with many disks, simultaneous CopyVolume requests to Pure FlashArray were timing out, leaving PVCs stuck in Pending. In one observed case, 15 disks were migrated but only 7 reached Bound status — the remaining 8 populator pods failed with:

failed to copy VMDK using VVol storage API: copy operation failed: Pure FlashArray CopyVolume failed:
failed to send copy volume request: Post "https://<array>/api/2.46/volumes?overwrite=true":
context deadline exceeded (Client.Timeout exceeded while awaiting headers)

The root cause is that the HTTP client timeout was hardcoded to 30 seconds with no way to extend it, making it impossible to accommodate slower or heavily-loaded arrays.

Changes:

  • NewRestClient now accepts an httpTimeoutSeconds int parameter instead of a hardcoded value. A value of <= 0 falls back to the 30s default.
  • NewFlashArrayClonner threads the parameter through to NewRestClient.
  • A --storage-api-timeout-seconds CLI flag (default: 30) is added to the vsphere-xcopy-volume-populator binary.

How to configure:

Pass --storage-api-timeout-seconds=<value> to the populator binary. Full operator-side wiring (CRD field → VSphereXcopyPluginConfigVSphereXcopyVolumePopulatorSpec → populator-controller pod args) is a follow-up.

Default behaviour is unchanged — the timeout remains 30 seconds unless explicitly overridden.

flag.StringVar(&vspherePassword, "vsphere-password", os.Getenv("GOVMOMI_PASSWORD"), "vSphere's API password")
flag.StringVar(&esxiCloneMethod, "esxi-clone-method", os.Getenv("ESXI_CLONE_METHOD"), "ESXi clone method: 'vib' (default) or 'ssh'")
flag.IntVar(&sshTimeoutSeconds, "ssh-timeout-seconds", 30, "SSH timeout in seconds for ESXi operations (default: 30)")
flag.IntVar(&storageAPITimeoutSeconds, "storage-api-timeout-seconds", 30, "HTTP client timeout in seconds for storage API requests (default: 30)")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make the default os.GetEnv("STORAGE_HTTP_TIMEOUT_SECONDS") instead of 30 and that will allow that configurtion to be passed as part of the storage secret in the storageMap. Otherwise this is hard to use.
Also please add that entry in cmd/vsphere-xcopy-volume-popualtor/README.md under the STORAGE_ secret keys

Copy link
Copy Markdown
Contributor Author

@MikeAnders08 MikeAnders08 Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both done in the latest commit 2a7932c:

  • storageAPITimeoutSeconds is now a StringVar with os.Getenv("STORAGE_HTTP_TIMEOUT_SECONDS") as default, same pattern as the other STORAGE_* vars. strconv.Atoi handles the conversion at the call site with a warning log for bad values, and the <= 0 guard in NewRestClient keeps the 30s fallback.

  • Added STORAGE_HTTP_TIMEOUT_SECONDS to the secret keys table in README.

@rgolangh rgolangh added backport-release-2.11 This label will trigger a backport to 2.11 once the PR is merged storage-offload labels Mar 19, 2026
@rgolangh
Copy link
Copy Markdown
Collaborator

the DCO check is failing - please add you git signature

@MikeAnders08 MikeAnders08 force-pushed the main branch 2 times, most recently from d36ce72 to 6b5eabf Compare March 19, 2026 14:53
@MikeAnders08
Copy link
Copy Markdown
Contributor Author

the DCO check is failing - please add you git signature

Done

@MikeAnders08 MikeAnders08 changed the title feat: Make Pure FlashArray HTTP client timeout configurable fix(pure): Make Pure FlashArray HTTP client timeout configurable Mar 20, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 20, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 10.10%. Comparing base (f1fe5d0) to head (7dfad56).
⚠️ Report is 2045 commits behind head on main.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5551      +/-   ##
==========================================
- Coverage   15.45%   10.10%   -5.35%     
==========================================
  Files         112      500     +388     
  Lines       23377    57429   +34052     
==========================================
+ Hits         3613     5804    +2191     
- Misses      19479    51144   +31665     
- Partials      285      481     +196     
Flag Coverage Δ
unittests 10.10% <ø> (-5.35%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

case forklift.StorageVendorProductPureFlashArray:
apiTimeout, err := strconv.Atoi(storageAPITimeoutSeconds)
if err != nil && storageAPITimeoutSeconds != "" {
klog.Warningf("invalid value %q for storage-api-timeout-seconds, using default (30s): %v", storageAPITimeoutSeconds, err)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the warning, change it to the new flag name

@rgolangh
Copy link
Copy Markdown
Collaborator

We are close, there's another small comment, and also please pull rebase

…GE_HTTP_TIMEOUT_SECONDS

Resolves: None
Signed-off-by: Michael Jons <Michael.Jons@tre.se>
@MikeAnders08 MikeAnders08 force-pushed the main branch 2 times, most recently from 93f44aa to 9b43830 Compare March 23, 2026 08:34
@MikeAnders08
Copy link
Copy Markdown
Contributor Author

We are close, there's another small comment, and also please pull rebase

Done

Resolves: None

Signed-off-by: Michael Jons <Michael.Jons@tre.se>
@sonarqubecloud
Copy link
Copy Markdown

@rgolangh rgolangh merged commit 3cfdd48 into kubev2v:main Mar 24, 2026
13 of 14 checks passed
@rgolangh
Copy link
Copy Markdown
Collaborator

/backport release-2.11

@github-actions
Copy link
Copy Markdown

🔄 Starting backport of PR #5551 to release-2.11
🚀 Live mode
View run

@github-actions
Copy link
Copy Markdown

✅ PR #5551 backported to release-2.11.

rgolangh pushed a commit that referenced this pull request Mar 25, 2026
…nfigurable (#5615)

**Backport:** #5551

**Make Pure FlashArray HTTP client timeout configurable**

**Problem:**

During migrations of VMs with many disks, simultaneous `CopyVolume`
requests to Pure FlashArray were timing out, leaving PVCs stuck in
`Pending`. In one observed case, 15 disks were migrated but only 7
reached `Bound` status — the remaining 8 populator pods failed with:

```
failed to copy VMDK using VVol storage API: copy operation failed: Pure FlashArray CopyVolume failed:
failed to send copy volume request: Post "https://<array>/api/2.46/volumes?overwrite=true":
context deadline exceeded (Client.Timeout exceeded while awaiting headers)
```

The root cause is that the HTTP client timeout was hardcoded to 30
seconds with no way to extend it, making it impossible to accommodate
slower or heavily-loaded arrays.

**Changes:**

- `NewRestClient` now accepts an `httpTimeoutSeconds int` parameter
instead of a hardcoded value. A value of `<= 0` falls back to the 30s
default.
- `NewFlashArrayClonner` threads the parameter through to
`NewRestClient`.
- A `--storage-api-timeout-seconds` CLI flag (default: `30`) is added to
the `vsphere-xcopy-volume-populator` binary.

**How to configure:**

Pass `--storage-api-timeout-seconds=<value>` to the populator binary.
Full operator-side wiring (CRD field → `VSphereXcopyPluginConfig` →
`VSphereXcopyVolumePopulatorSpec` → populator-controller pod args) is a
follow-up.

**Default behaviour is unchanged** — the timeout remains 30 seconds
unless explicitly overridden.

---------

Signed-off-by: Michael Jons <Michael.Jons@tre.se>
Co-authored-by: Michael Jons <Michael.Jons@tre.se>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-release-2.11 This label will trigger a backport to 2.11 once the PR is merged storage-offload

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants