Skip to content

fix(driver-utils): prevent unhandled rejections in prefetch#26151

Merged
anthony-murphy merged 6 commits intomicrosoft:mainfrom
anthony-murphy:fix/prefetch-unhandled-rejections
Jan 8, 2026
Merged

fix(driver-utils): prevent unhandled rejections in prefetch#26151
anthony-murphy merged 6 commits intomicrosoft:mainfrom
anthony-murphy:fix/prefetch-unhandled-rejections

Conversation

@anthony-murphy-agent
Copy link
Contributor

@anthony-murphy-agent anthony-murphy-agent commented Jan 7, 2026

Summary

  • Fix unhandled promise rejections in PrefetchDocumentStorageService when parallel prefetch requests fail
  • Add unit tests for prefetch error handling and caching behavior

Problem

The PrefetchDocumentStorageService fires prefetch requests using fire-and-forget pattern (void this.cachedRead(blob)). When these requests failed, the .catch() handler was re-throwing errors, creating rejected promises stored in cache that no one awaited - causing uncaughtException errors.

Evidence from telemetry showed 98 uncaughtException errors from a single process within ~70ms, all originating from readBlob via prefetch.

Solution

Store the original promise in the cache and attach .catch() for side effects only (cache cleanup on retryable errors). Do not re-throw - callers who await the cached promise still receive the rejection properly.

Also added a race-condition fix to avoid clearing a newer cached promise when an older one fails.

Test plan

  • New unit tests for PrefetchDocumentStorageService (4 tests covering error propagation, retry behavior, prefetch caching, and fire-and-forget failure handling)
  • driver-utils tests pass (60 tests)
  • Lint passes

🤖 Generated with Claude Code

Co-Authored-By: anthony-murphy anthony.murphy@microsoft.com

fixes AB#47951

anthony-murphy-agent and others added 6 commits January 7, 2026 10:27
The PrefetchDocumentStorageService was storing a promise that re-threw
errors in the cache. When prefetch fired requests with fire-and-forget
pattern (void this.cachedRead()), failures created rejected promises
that no one awaited, causing unhandled rejection errors.

The fix stores the original promise in the cache and attaches a .catch()
handler for side effects only (cache cleanup on retryable errors).
Callers who await the cached promise still receive the rejection properly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com>
Add unit tests verifying:
- Errors propagate to callers who await readBlob
- Cache is cleared on retryable errors, allowing retry
- Prefetch successfully caches blobs for later reads

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com>
Replace `assert.ok(result)` with `assert.strictEqual(result.byteLength, 3)`
to avoid using an object in a boolean context.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com>
- Fix race condition in catch handler by checking if cached promise
  is still the same before deleting (prevents concurrent request issues)
- Replace flaky setTimeout with polling-based waitForCondition helper
- Add test for fire-and-forget prefetch failure scenario
- Improve assertion messages for clarity

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com>
Use Promise.resolve() to flush the microtask queue for catch handlers,
as setImmediate schedules a macrotask which is semantically incorrect
for this purpose.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com>
@anthony-murphy-agent
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 26151 in repo microsoft/FluidFramework

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes unhandled promise rejections in PrefetchDocumentStorageService that occurred when fire-and-forget prefetch requests failed. The fix changes how error handlers are attached to cached promises to prevent rejected promises from causing uncaughtException errors.

Key changes:

  • Modified the error handling in cachedRead() to attach .catch() for side effects only, without re-throwing errors
  • Added race condition protection to prevent clearing newer cached promises when older ones fail
  • Added comprehensive unit tests covering error propagation, retry behavior, prefetch caching, and fire-and-forget failure scenarios

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
packages/loader/driver-utils/src/prefetchDocumentStorageService.ts Fixed unhandled rejection issue by storing original promise in cache and attaching non-throwing catch handler for cleanup; added race condition protection
packages/loader/driver-utils/src/test/prefetchDocumentStorageService.spec.ts Added comprehensive test suite with 4 tests covering error propagation, cache clearing on retryable errors, successful prefetch caching, and fire-and-forget failure handling

@anthony-murphy
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@anthony-murphy
Copy link
Contributor

/azp run Build - client packages

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@anthony-murphy
Copy link
Contributor

/azp run Build - client packages, Build - api-markdown-documenter, Build - benchmark-tool, Build - build-common, Build - build-tools, Build - common-utils, Build - eslint-config-fluid, Build - eslint-plugin-fluid, Build - protocol-definitions, Build - test-tools, repo-policy-check, server-routerlicious, server-gitrest, server-historian, server-gitssh

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@anthony-murphy
Copy link
Contributor

/azp run Build - client packages, Build - api-markdown-documenter, Build - benchmark-tool, Build - build-common, Build - build-tools, Build - common-utils, Build - eslint-config-fluid, Build - eslint-plugin-fluid

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@anthony-murphy
Copy link
Contributor

/azp run Build - protocol-definitions, Build - test-tools, repo-policy-check, server-routerlicious, server-gitrest, server-historian, server-gitssh

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@anthony-murphy
Copy link
Contributor

fixes AB#47951

@anthony-murphy anthony-murphy merged commit 06d309d into microsoft:main Jan 8, 2026
42 checks passed
@anthony-murphy anthony-murphy deleted the fix/prefetch-unhandled-rejections branch January 8, 2026 18:38

it("should clear cache on retryable errors allowing retry", async () => {
const retryableError = new Error("Retryable error");
(retryableError as any).canRetry = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should avoid any and especially as any. This suggests that Errors used here do not match errors that will actually be used. Can be a maintenance issue for the future.

Comment on lines +154 to +160
// Now verify that explicit readBlob calls still receive the error properly
mockStorage.readBlobCalls = [];
await assert.rejects(
async () => prefetchService.readBlob("blob1"),
(error: Error) => error.message === "Prefetch network failure",
"Explicit readBlob should still receive the error",
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This portion of the testing is a little sketchy - at the very least unclear and can easily be misinterpreted.
mockStorage.readBlobCalls = []; does nothing. There is nothing that examines mockStorage.readBlobCalls after it is changed.
In this place, there should be nothing cached. The cached item should have been cleared. So here it would be fine to change storage to respond differently. There isn't a test to cover non-retriable errors. That should be added. A better test here could be to make this error non-retriable and to check that the error is the same error.

@@ -57,15 +57,20 @@ export class PrefetchDocumentStorageService extends DocumentStorageServiceProxy
return prefetchedBlobP;
}
const prefetchedBlobPFromStorage = this.internalStorageService.readBlob(blobId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to skip the readBlob call if the blobId is already in prefetchCache?

Side note - you might recall that back in the day I wrote a utility for this (that correctly chains catch as you're doing in this PR) called PromiseCache. Could be useful to avoid these kinds of bugs.

anthony-murphy added a commit that referenced this pull request Jan 8, 2026
## Summary
- Adds explicit return type to `policies` getter in test mock to satisfy
`@typescript-eslint/explicit-function-return-type` rule

## Root cause
Race condition between two PRs:

| Event | Time (UTC) |
|-------|------------|
| PR #26151 created (prefetch fix) | Jan 7, 23:12 |
| PR #26149 merged (ESLint rule promotion) | Jan 8, 01:45 |
| PR #26151 merged | Jan 8, 18:38 |

PR #26151's CI ran before the ESLint rule was promoted in #26149. By the
time #26151 was merged, the stricter linting rule was already in place,
but CI wasn't re-run against the updated main branch.

## Test plan
- [x] ESLint passes locally

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: anthony-murphy-agent <253562292+anthony-murphy-agent@users.noreply.github.com>
Co-authored-by: anthony-murphy <anthony.murphy@microsoft.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
anthony-murphy-agent added a commit to anthony-murphy-agent/FluidFramework that referenced this pull request Jan 9, 2026
…fetch

When getSnapshotTree() fails (e.g., network timeout), the fire-and-forget
prefetch pattern `void p.then(...)` creates a derived promise that also
rejects. Without a .catch() handler, this causes uncaughtException errors.

This is a follow-up to PR microsoft#26151 which fixed the same issue in cachedRead()
but missed this code path in getSnapshotTree().

The fix adds .catch() to the derived promise chain to prevent unhandled
rejections. The original promise `p` is still returned to callers, so they
receive the error properly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com>
anthony-murphy-agent added a commit to anthony-murphy-agent/FluidFramework that referenced this pull request Jan 9, 2026
…orget calls

Fix three fire-and-forget patterns that cause unhandled rejections:

1. getSnapshotTree(): `void p.then(...)` creates a derived promise that
   rejects when p rejects. Added `.catch()` to the promise chain.

2. prefetchTree()/prefetchTreeCore(): `void this.cachedRead(blob)` discards
   the async function's returned promise. Since cachedRead is async, it
   returns a separate promise chain from the inner prefetchedBlobPFromStorage.
   The existing `.catch()` inside cachedRead only handles the inner promise,
   not the async function's returned promise. Changed to
   `this.cachedRead(blob).catch(() => {})`.

The root cause: attaching `.catch()` to a promise P only prevents unhandled
rejection for P's chain. When an async function returns P, the function's
returned promise is a DIFFERENT chain that also rejects - and needs its own
`.catch()` handler.

This is a follow-up to PR microsoft#26151 which only fixed the inner promise in
cachedRead but didn't fix the callers' fire-and-forget patterns.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com>
anthony-murphy added a commit that referenced this pull request Jan 9, 2026
…fetch (#26173)

## Summary

- Fix unhandled promise rejections in `PrefetchDocumentStorageService`
fire-and-forget calls
- Add unit test for getSnapshotTree failure scenario

## Problem

This is a follow-up to PR #26151. After that fix was deployed, telemetry
showed 349 `uncaughtException` errors from
`ShreddedSummaryDocumentStorageService.readBlob` in test-service-load
tinylicious tests.

**Root cause:** PR #26151 attached `.catch()` to the inner promise in
`cachedRead()`, but that only handles ONE promise chain. The
fire-and-forget callers create a DIFFERENT chain:

1. **`void this.cachedRead(blob)`** - `cachedRead` is async, so it
returns a NEW promise wrapping the inner promise. The `.catch()` on the
inner promise doesn't prevent unhandled rejection on the async
function's returned promise.

2. **`void p.then(...)`** in `getSnapshotTree()` - creates a derived
promise that also rejects when `p` rejects.

## Solution

Fix all three fire-and-forget patterns:

```typescript
// Before (inner promise handled, but async wrapper not):
void this.cachedRead(blob);

// After (handles the async function's returned promise):
this.cachedRead(blob).catch(() => {});
```

```typescript
// Before (derived promise not handled):
void p.then((tree) => this.prefetchTree(tree));

// After (handles the derived promise):
void p.then(...).catch(() => {});
```

## Test plan

- [x] New unit test for `getSnapshotTree` failure scenario
- [x] All 5 PrefetchDocumentStorageService tests pass
- [x] Build passes

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com>

---------

Co-authored-by: anthony-murphy-agent <253562292+anthony-murphy-agent@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: anthony-murphy <anthony.murphy@microsoft.com>
anthony-murphy-agent added a commit to anthony-murphy-agent/FluidFramework that referenced this pull request Jan 13, 2026
- Replace `any` with proper error types in tests using GenericNetworkError
  and NonRetryableError from network.ts
- Add test for non-retryable errors to verify cache behavior (errors without
  canRetry=true should remain cached, not cleared)
- Skip prefetch calls if blobId is already cached to avoid unnecessary
  async overhead

Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
anthony-murphy added a commit that referenced this pull request Jan 14, 2026
## Summary
- Replace `any` with proper error types in tests using
`GenericNetworkError` and `NonRetryableError`
- Add test for non-retryable errors to verify cache behavior
- Skip prefetch calls if blobId is already cached to avoid unnecessary
async overhead

This is a follow-up PR to address review comments from #26151:
- @jason-ha: Avoid `any` and `as any` - errors should use proper types
- @jason-ha: Add test for non-retryable errors
- @markfields: Skip `readBlob` call if blobId is already in
`prefetchCache`

## Test plan
- [x] New unit test for non-retryable error caching behavior
- [x] All existing prefetch tests pass (6 total)
- [x] Lint passes

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com>

---------

Co-authored-by: anthony-murphy-agent <anthony-murphy-agent@users.noreply.github.com>
Co-authored-by: anthony-murphy <anthony.murphy@microsoft.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
anthony-murphy-agent added a commit to anthony-murphy-agent/FluidFramework that referenced this pull request Jan 14, 2026
…t#26151)

## Summary
- Fix unhandled promise rejections in `PrefetchDocumentStorageService`
when parallel prefetch requests fail
- Add unit tests for prefetch error handling and caching behavior

## Problem
The `PrefetchDocumentStorageService` fires prefetch requests using
fire-and-forget pattern (`void this.cachedRead(blob)`). When these
requests failed, the `.catch()` handler was re-throwing errors, creating
rejected promises stored in cache that no one awaited - causing
`uncaughtException` errors.

Evidence from telemetry showed 98 `uncaughtException` errors from a
single process within ~70ms, all originating from `readBlob` via
prefetch.

## Solution
Store the original promise in the cache and attach `.catch()` for side
effects only (cache cleanup on retryable errors). Do not re-throw -
callers who await the cached promise still receive the rejection
properly.

Also added a race-condition fix to avoid clearing a newer cached promise
when an older one fails.

## Test plan
- [x] New unit tests for `PrefetchDocumentStorageService` (4 tests
covering error propagation, retry behavior, prefetch caching, and
fire-and-forget failure handling)
- [x] driver-utils tests pass (60 tests)
- [x] Lint passes

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com>

fixes
[AB#47951](https://dev.azure.com/fluidframework/235294da-091d-4c29-84fc-cdfc3d90890b/_workitems/edit/47951)

---------

Co-authored-by: anthony-murphy-agent <253562292+anthony-murphy-agent@users.noreply.github.com>
Co-authored-by: anthony-murphy <anthony.murphy@microsoft.com>
anthony-murphy-agent added a commit to anthony-murphy-agent/FluidFramework that referenced this pull request Jan 14, 2026
…oft#26161)

## Summary
- Adds explicit return type to `policies` getter in test mock to satisfy
`@typescript-eslint/explicit-function-return-type` rule

## Root cause
Race condition between two PRs:

| Event | Time (UTC) |
|-------|------------|
| PR microsoft#26151 created (prefetch fix) | Jan 7, 23:12 |
| PR microsoft#26149 merged (ESLint rule promotion) | Jan 8, 01:45 |
| PR microsoft#26151 merged | Jan 8, 18:38 |

PR microsoft#26151's CI ran before the ESLint rule was promoted in microsoft#26149. By the
time microsoft#26151 was merged, the stricter linting rule was already in place,
but CI wasn't re-run against the updated main branch.

## Test plan
- [x] ESLint passes locally

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: anthony-murphy-agent <253562292+anthony-murphy-agent@users.noreply.github.com>
Co-authored-by: anthony-murphy <anthony.murphy@microsoft.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
anthony-murphy-agent added a commit to anthony-murphy-agent/FluidFramework that referenced this pull request Jan 14, 2026
…fetch (microsoft#26173)

## Summary

- Fix unhandled promise rejections in `PrefetchDocumentStorageService`
fire-and-forget calls
- Add unit test for getSnapshotTree failure scenario

## Problem

This is a follow-up to PR microsoft#26151. After that fix was deployed, telemetry
showed 349 `uncaughtException` errors from
`ShreddedSummaryDocumentStorageService.readBlob` in test-service-load
tinylicious tests.

**Root cause:** PR microsoft#26151 attached `.catch()` to the inner promise in
`cachedRead()`, but that only handles ONE promise chain. The
fire-and-forget callers create a DIFFERENT chain:

1. **`void this.cachedRead(blob)`** - `cachedRead` is async, so it
returns a NEW promise wrapping the inner promise. The `.catch()` on the
inner promise doesn't prevent unhandled rejection on the async
function's returned promise.

2. **`void p.then(...)`** in `getSnapshotTree()` - creates a derived
promise that also rejects when `p` rejects.

## Solution

Fix all three fire-and-forget patterns:

```typescript
// Before (inner promise handled, but async wrapper not):
void this.cachedRead(blob);

// After (handles the async function's returned promise):
this.cachedRead(blob).catch(() => {});
```

```typescript
// Before (derived promise not handled):
void p.then((tree) => this.prefetchTree(tree));

// After (handles the derived promise):
void p.then(...).catch(() => {});
```

## Test plan

- [x] New unit test for `getSnapshotTree` failure scenario
- [x] All 5 PrefetchDocumentStorageService tests pass
- [x] Build passes

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com>

---------

Co-authored-by: anthony-murphy-agent <253562292+anthony-murphy-agent@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: anthony-murphy <anthony.murphy@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants