Skip to content

Conversation

@anthony-murphy-agent
Copy link
Contributor

@anthony-murphy-agent anthony-murphy-agent commented Jan 7, 2026

Summary

  • Fix unhandled promise rejections in PrefetchDocumentStorageService when parallel prefetch requests fail
  • Add unit tests for prefetch error handling and caching behavior

Problem

The PrefetchDocumentStorageService fires prefetch requests using fire-and-forget pattern (void this.cachedRead(blob)). When these requests failed, the .catch() handler was re-throwing errors, creating rejected promises stored in cache that no one awaited - causing uncaughtException errors.

Evidence from telemetry showed 98 uncaughtException errors from a single process within ~70ms, all originating from readBlob via prefetch.

Solution

Store the original promise in the cache and attach .catch() for side effects only (cache cleanup on retryable errors). Do not re-throw - callers who await the cached promise still receive the rejection properly.

Also added a race-condition fix to avoid clearing a newer cached promise when an older one fails.

Test plan

  • New unit tests for PrefetchDocumentStorageService (4 tests covering error propagation, retry behavior, prefetch caching, and fire-and-forget failure handling)
  • driver-utils tests pass (60 tests)
  • Lint passes

🤖 Generated with Claude Code

Co-Authored-By: anthony-murphy [email protected]

fixes AB#47951

anthony-murphy-agent and others added 6 commits January 7, 2026 10:27
The PrefetchDocumentStorageService was storing a promise that re-threw
errors in the cache. When prefetch fired requests with fire-and-forget
pattern (void this.cachedRead()), failures created rejected promises
that no one awaited, causing unhandled rejection errors.

The fix stores the original promise in the cache and attaches a .catch()
handler for side effects only (cache cleanup on retryable errors).
Callers who await the cached promise still receive the rejection properly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: anthony-murphy <[email protected]>
Add unit tests verifying:
- Errors propagate to callers who await readBlob
- Cache is cleared on retryable errors, allowing retry
- Prefetch successfully caches blobs for later reads

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: anthony-murphy <[email protected]>
Replace `assert.ok(result)` with `assert.strictEqual(result.byteLength, 3)`
to avoid using an object in a boolean context.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: anthony-murphy <[email protected]>
- Fix race condition in catch handler by checking if cached promise
  is still the same before deleting (prevents concurrent request issues)
- Replace flaky setTimeout with polling-based waitForCondition helper
- Add test for fire-and-forget prefetch failure scenario
- Improve assertion messages for clarity

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: anthony-murphy <[email protected]>
Use Promise.resolve() to flush the microtask queue for catch handlers,
as setImmediate schedules a macrotask which is semantically incorrect
for this purpose.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: anthony-murphy <[email protected]>
@anthony-murphy-agent
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 26151 in repo microsoft/FluidFramework

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes unhandled promise rejections in PrefetchDocumentStorageService that occurred when fire-and-forget prefetch requests failed. The fix changes how error handlers are attached to cached promises to prevent rejected promises from causing uncaughtException errors.

Key changes:

  • Modified the error handling in cachedRead() to attach .catch() for side effects only, without re-throwing errors
  • Added race condition protection to prevent clearing newer cached promises when older ones fail
  • Added comprehensive unit tests covering error propagation, retry behavior, prefetch caching, and fire-and-forget failure scenarios

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
packages/loader/driver-utils/src/prefetchDocumentStorageService.ts Fixed unhandled rejection issue by storing original promise in cache and attaching non-throwing catch handler for cleanup; added race condition protection
packages/loader/driver-utils/src/test/prefetchDocumentStorageService.spec.ts Added comprehensive test suite with 4 tests covering error propagation, cache clearing on retryable errors, successful prefetch caching, and fire-and-forget failure handling

@anthony-murphy
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@anthony-murphy
Copy link
Contributor

/azp run Build - client packages

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@anthony-murphy
Copy link
Contributor

/azp run Build - client packages, Build - api-markdown-documenter, Build - benchmark-tool, Build - build-common, Build - build-tools, Build - common-utils, Build - eslint-config-fluid, Build - eslint-plugin-fluid, Build - protocol-definitions, Build - test-tools, repo-policy-check, server-routerlicious, server-gitrest, server-historian, server-gitssh

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@anthony-murphy
Copy link
Contributor

/azp run Build - client packages, Build - api-markdown-documenter, Build - benchmark-tool, Build - build-common, Build - build-tools, Build - common-utils, Build - eslint-config-fluid, Build - eslint-plugin-fluid

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@anthony-murphy
Copy link
Contributor

/azp run Build - protocol-definitions, Build - test-tools, repo-policy-check, server-routerlicious, server-gitrest, server-historian, server-gitssh

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@anthony-murphy
Copy link
Contributor

fixes AB#47951

@anthony-murphy anthony-murphy merged commit 06d309d into microsoft:main Jan 8, 2026
42 checks passed
@anthony-murphy anthony-murphy deleted the fix/prefetch-unhandled-rejections branch January 8, 2026 18:38

it("should clear cache on retryable errors allowing retry", async () => {
const retryableError = new Error("Retryable error");
(retryableError as any).canRetry = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should avoid any and especially as any. This suggests that Errors used here do not match errors that will actually be used. Can be a maintenance issue for the future.

Comment on lines +154 to +160
// Now verify that explicit readBlob calls still receive the error properly
mockStorage.readBlobCalls = [];
await assert.rejects(
async () => prefetchService.readBlob("blob1"),
(error: Error) => error.message === "Prefetch network failure",
"Explicit readBlob should still receive the error",
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This portion of the testing is a little sketchy - at the very least unclear and can easily be misinterpreted.
mockStorage.readBlobCalls = []; does nothing. There is nothing that examines mockStorage.readBlobCalls after it is changed.
In this place, there should be nothing cached. The cached item should have been cleared. So here it would be fine to change storage to respond differently. There isn't a test to cover non-retriable errors. That should be added. A better test here could be to make this error non-retriable and to check that the error is the same error.

if (prefetchedBlobP !== undefined) {
return prefetchedBlobP;
}
const prefetchedBlobPFromStorage = this.internalStorageService.readBlob(blobId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to skip the readBlob call if the blobId is already in prefetchCache?

Side note - you might recall that back in the day I wrote a utility for this (that correctly chains catch as you're doing in this PR) called PromiseCache. Could be useful to avoid these kinds of bugs.

anthony-murphy added a commit that referenced this pull request Jan 8, 2026
## Summary
- Adds explicit return type to `policies` getter in test mock to satisfy
`@typescript-eslint/explicit-function-return-type` rule

## Root cause
Race condition between two PRs:

| Event | Time (UTC) |
|-------|------------|
| PR #26151 created (prefetch fix) | Jan 7, 23:12 |
| PR #26149 merged (ESLint rule promotion) | Jan 8, 01:45 |
| PR #26151 merged | Jan 8, 18:38 |

PR #26151's CI ran before the ESLint rule was promoted in #26149. By the
time #26151 was merged, the stricter linting rule was already in place,
but CI wasn't re-run against the updated main branch.

## Test plan
- [x] ESLint passes locally

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: anthony-murphy-agent <[email protected]>
Co-authored-by: anthony-murphy <[email protected]>
Co-authored-by: Claude Opus 4.5 <[email protected]>
anthony-murphy-agent added a commit to anthony-murphy-agent/FluidFramework that referenced this pull request Jan 9, 2026
…fetch

When getSnapshotTree() fails (e.g., network timeout), the fire-and-forget
prefetch pattern `void p.then(...)` creates a derived promise that also
rejects. Without a .catch() handler, this causes uncaughtException errors.

This is a follow-up to PR microsoft#26151 which fixed the same issue in cachedRead()
but missed this code path in getSnapshotTree().

The fix adds .catch() to the derived promise chain to prevent unhandled
rejections. The original promise `p` is still returned to callers, so they
receive the error properly.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: anthony-murphy <[email protected]>
anthony-murphy-agent added a commit to anthony-murphy-agent/FluidFramework that referenced this pull request Jan 9, 2026
…orget calls

Fix three fire-and-forget patterns that cause unhandled rejections:

1. getSnapshotTree(): `void p.then(...)` creates a derived promise that
   rejects when p rejects. Added `.catch()` to the promise chain.

2. prefetchTree()/prefetchTreeCore(): `void this.cachedRead(blob)` discards
   the async function's returned promise. Since cachedRead is async, it
   returns a separate promise chain from the inner prefetchedBlobPFromStorage.
   The existing `.catch()` inside cachedRead only handles the inner promise,
   not the async function's returned promise. Changed to
   `this.cachedRead(blob).catch(() => {})`.

The root cause: attaching `.catch()` to a promise P only prevents unhandled
rejection for P's chain. When an async function returns P, the function's
returned promise is a DIFFERENT chain that also rejects - and needs its own
`.catch()` handler.

This is a follow-up to PR microsoft#26151 which only fixed the inner promise in
cachedRead but didn't fix the callers' fire-and-forget patterns.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: anthony-murphy <[email protected]>
anthony-murphy added a commit that referenced this pull request Jan 9, 2026
…fetch (#26173)

## Summary

- Fix unhandled promise rejections in `PrefetchDocumentStorageService`
fire-and-forget calls
- Add unit test for getSnapshotTree failure scenario

## Problem

This is a follow-up to PR #26151. After that fix was deployed, telemetry
showed 349 `uncaughtException` errors from
`ShreddedSummaryDocumentStorageService.readBlob` in test-service-load
tinylicious tests.

**Root cause:** PR #26151 attached `.catch()` to the inner promise in
`cachedRead()`, but that only handles ONE promise chain. The
fire-and-forget callers create a DIFFERENT chain:

1. **`void this.cachedRead(blob)`** - `cachedRead` is async, so it
returns a NEW promise wrapping the inner promise. The `.catch()` on the
inner promise doesn't prevent unhandled rejection on the async
function's returned promise.

2. **`void p.then(...)`** in `getSnapshotTree()` - creates a derived
promise that also rejects when `p` rejects.

## Solution

Fix all three fire-and-forget patterns:

```typescript
// Before (inner promise handled, but async wrapper not):
void this.cachedRead(blob);

// After (handles the async function's returned promise):
this.cachedRead(blob).catch(() => {});
```

```typescript
// Before (derived promise not handled):
void p.then((tree) => this.prefetchTree(tree));

// After (handles the derived promise):
void p.then(...).catch(() => {});
```

## Test plan

- [x] New unit test for `getSnapshotTree` failure scenario
- [x] All 5 PrefetchDocumentStorageService tests pass
- [x] Build passes

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: anthony-murphy <[email protected]>

---------

Co-authored-by: anthony-murphy-agent <[email protected]>
Co-authored-by: Claude Opus 4.5 <[email protected]>
Co-authored-by: anthony-murphy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants