Add configuration for consumer handling of retry on container creation by kian-thompson · Pull Request #26698 · microsoft/FluidFramework

kian-thompson · 2026-03-11T01:10:37Z

Currently, we continue to retry retriable errors indefinitely on container creation. However in case of certain consumers where there is limited time (30/60 seconds) for the create, we should not have infinite retries even on retriable errors.

If the consumer of attach wants to handle the retry mechanism, they can enable the "Fluid.Container.DisableCloseOnAttachFailure" flag. This flag will cause no internal retries to occur and let any error surface to the consumer of attach.

AB#57593

Copilot

Pull request overview

Adds a configurable cap on retries for retriable errors during container creation/attach flows, addressing scenarios where hosts have a bounded time window for creation.

Changes:

Extend runWithRetry to support an optional maxRetries limit and throw a non-retriable wrapped error when exceeded.
Plumb a new maxCreateRetries option through Container.attach() (with config override via Fluid.Container.CreateMaxRetries) into the attach-time runWithRetry call.
Add unit tests for the new retry-limit behavior and update the legacy beta API report.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
packages/loader/driver-utils/src/test/runWithRetry.spec.ts	Adds coverage for bounded retries, success-before-limit, unlimited default, and `0` retries behavior.
packages/loader/driver-utils/src/runWithRetry.ts	Implements `maxRetries` support and telemetry when the retry cap is exceeded.
packages/loader/container-loader/src/container.ts	Adds `maxCreateRetries` attach option and wires config/attach override into attach-time retries.
packages/common/container-definitions/src/loader.ts	Updates `IContainer.attach()` typing/docs to expose `maxCreateRetries`.
packages/common/container-definitions/api-report/container-definitions.legacy.beta.api.md	Updates generated legacy beta API report to reflect the new attach option.

Copilot · 2026-03-11T01:15:14Z

packages/loader/driver-utils/src/runWithRetry.ts

+			// Check if max retries limit has been reached
+			if (progress.maxRetries !== undefined && numRetries > progress.maxRetries) {
+				logger.sendTelemetryEvent(


progress.maxRetries is used without validation. If a caller passes NaN, Infinity, a negative value, or a non-integer, the comparison numRetries > progress.maxRetries can behave unexpectedly (e.g., NaN makes the limit never trigger, reintroducing infinite retries). Consider normalizing the option (e.g., require a finite, non-negative integer; otherwise treat as undefined or throw a non-retriable UsageError).

anthony-murphy · 2026-03-11T16:53:21Z

we built the ability to allow customers to retry attach if it should fail. did we consider moving retry for attach/create failures to the consumer when retriable attach is enabled: "Fluid.Container.RetryOnAttachFailure" with the eventually goal for enabling it by default, as the current behavior of just retrying forever isn't great for customers. Bascially, we can just set retries to 0 when Fluid.Container.RetryOnAttachFailure is on, and then the consumer owns retry, and whatever policy they want?

what i like about this approach is it lets use move away from exposing complex policy in our apis, and lets consumers have control.

shlevari

Looks good in general, just some light comments about the naming and exposure surface for controlling the retries.

shlevari · 2026-03-19T22:37:58Z

packages/loader/container-loader/src/container.ts

+			// When RetryOnAttachFailure is enabled, use no internal retries
+			// The consumer will own the retry policy
+			const retryOnAttachFailure =
+				this.mc.config.getBoolean("Fluid.Container.RetryOnAttachFailure") === true;


Is this flag name already locked in? To me its naming is unintuitive -- it sounds like it's telling the Fluid Container to do retries, but instead its saying the caller will do the retries.

Also, does it make sense to give the caller control over how many retries to do? If they want to disable it, set it to zero, but that way it gives them the opportunity to optimize for their specific startup time requirements

+1, MaxRetriesOnAttachFailure seems nice, otherwise DisableRetryOnAttachFailure maybe

@anthony-murphy Thoughts on changing the name to something like Fluid.Container.DisableRetryOnAttachFailure?

Also, does it make sense to give the caller control over how many retries to do?

The important thing here is that the errors are bubbled up to the consumer of attach. I don't have any strong preference on how consumers can implement a retry mechanism, but retries are not necessary for this ask.

the flag primarily allows attach to be retired by the consumer, as without it the container will close when attach fails

the goal is to get out of the container level retry game here, and let the consumer decide what they want to do, retry calling, serialize the container and save it somewhere, or dispose the container throw it away

I changed the flag name to DisableCloseOnAttachFailure to be extra clear about what it does

ChumpChief · 2026-03-19T23:00:41Z

packages/loader/container-loader/src/container.ts

+			// When RetryOnAttachFailure is enabled, use no internal retries
+			// The consumer will own the retry policy
+			const retryOnAttachFailure =
+				this.mc.config.getBoolean("Fluid.Container.RetryOnAttachFailure") === true;


+1, MaxRetriesOnAttachFailure seems nice, otherwise DisableRetryOnAttachFailure maybe

packages/loader/driver-utils/src/runWithRetry.ts

Add configuration for retry on container creation

0a7d74e

kian-thompson requested review from a team, MarioJGMsoft, WillieHabi, jason-ha, jatgarg, markfields and steffenloesch March 11, 2026 01:10

kian-thompson requested a review from a team as a code owner March 11, 2026 01:10

Copilot AI review requested due to automatic review settings March 11, 2026 01:10

Copilot started reviewing on behalf of kian-thompson March 11, 2026 01:11 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Move retry logic to consumer of attach

ca01549

kian-thompson changed the title ~~Add configuration for retry on container creation~~ Add configuration for consumer handling of retry on container creation Mar 19, 2026

shlevari reviewed Mar 19, 2026

View reviewed changes

ChumpChief reviewed Mar 19, 2026

View reviewed changes

kian-thompson added 3 commits March 19, 2026 17:06

Cleanup

83dd4d5

Merge branch 'main' into 57593-create-retry-count

f443bef

Change flag name to DisableCloseOnAttachFailure

e9ebaeb

anthony-murphy approved these changes Mar 23, 2026

View reviewed changes

kian-thompson merged commit f84aa34 into microsoft:main Mar 23, 2026
32 checks passed

kian-thompson deleted the 57593-create-retry-count branch March 23, 2026 19:51

Conversation

kian-thompson commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

anthony-murphy commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shlevari left a comment

Choose a reason for hiding this comment

Uh oh!

shlevari Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

shlevari Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

ChumpChief Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

kian-thompson Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anthony-murphy Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

anthony-murphy Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kian-thompson Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

ChumpChief Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kian-thompson commented Mar 11, 2026 •

edited

Loading

anthony-murphy commented Mar 11, 2026 •

edited

Loading

kian-thompson Mar 19, 2026 •

edited

Loading

anthony-murphy Mar 23, 2026 •

edited

Loading