Skip to content

Conversation

adreed-msft
Copy link
Member

Description

  • Feature / Bug Fix:

This half-fixes a bug that's evoked in situations where the number of goroutines we initialize far out-scales the available network bandwidth. It can be evoked by setting AZCOPY_CONCURRENCY_VALUE super high and --cap-mbps super low.

The janky fix present in this PR is to default the auto-scaler to on. Because it's seeking the optimal output, and trying to not over-scale, scaling past the network's capabilities will cause it to scale down as throughput didn't increase as expected. This does not fix the problem, but it does work around it for the grand and wide majority of users. More PRs are to come of this.

  • Related Links: StgExp Sync chat

Type of Change

  • Bug fix
  • New feature
  • Documentation update required
  • Code quality improvement
  • Other (describe):

How Has This Been Tested?

Manual testing, set benchmark with a 100gb upload and 5mbps cap-mbps. Request lifetime stabilizes out to around a minute max after the concurrency tuner settles. This still isn't great, it's a really severe issue, but it does mean the job works, which is way better than it not working.

A user manually specifying both could still trigger the underlying behavior.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a mechanism to track individual HTTP request lifetimes and feeds that metric into the concurrency auto-tuner, while also making auto-tuning the default behavior.

  • Added RequestLifetimeTracker with sliding-window bucket aggregation, exposed as an HTTP pipeline policy.
  • Enhanced autoConcurrencyTuner to back off when unusually long request lifetimes are detected.
  • Switched CLI defaults to always enable auto-tuning and removed special-case file transfer overrides.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
ste/xferLifetimeTracker.go New tracker for request durations and policy API.
ste/mgr-JobPartMgr.go Injected lifetime tracker into per-retry pipeline.
ste/concurrencyTuner.go Added lifetime-based backoff and updated tuner API.
jobsAdmin/JobsAdmin.go Adjusted tuner creation call; removed old setter.
cmd/root.go Defaulted CLI to always use auto-tuning.
cmd/copy.go Removed legacy file-type concurrency override.
common/linkedList.go Introduced generic LinkedList utility.
Comments suppressed due to low confidence (3)

ste/mgr-JobPartMgr.go:108

  • Appending NewDestReauthPolicy to perRetryPolicies instead of perCallPolicies is likely a copy-paste error. It should be append(perCallPolicies, ...) to maintain intended ordering.
perCallPolicies = append(perRetryPolicies, NewDestReauthPolicy(dstCred))

ste/concurrencyTuner.go:150

  • [nitpick] The constant name minMulitplier is misspelled. Rename it to minMultiplier for clarity and consistency.
const minMulitplier = 1.19 // really this is 1.2, but use a little less to make the floating point comparisons robust

ste/xferLifetimeTracker.go:14

  • New RequestLifetimeTracker functionality lacks unit tests for bucket rotation and policy behavior. Adding tests will help ensure the tracking and backoff logic remains correct.
/*

@gapra-msft gapra-msft added this to the 10.30.0 milestone Jul 11, 2025
@gapra-msft gapra-msft modified the milestones: 10.30.0, 10.31.0 Jul 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants