-
Notifications
You must be signed in to change notification settings - Fork 562
fix(driver-utils): prevent unhandled rejections in prefetch #26151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(driver-utils): prevent unhandled rejections in prefetch #26151
Conversation
The PrefetchDocumentStorageService was storing a promise that re-threw errors in the cache. When prefetch fired requests with fire-and-forget pattern (void this.cachedRead()), failures created rejected promises that no one awaited, causing unhandled rejection errors. The fix stores the original promise in the cache and attaches a .catch() handler for side effects only (cache cleanup on retryable errors). Callers who await the cached promise still receive the rejection properly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: anthony-murphy <[email protected]>
Add unit tests verifying: - Errors propagate to callers who await readBlob - Cache is cleared on retryable errors, allowing retry - Prefetch successfully caches blobs for later reads 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: anthony-murphy <[email protected]>
Replace `assert.ok(result)` with `assert.strictEqual(result.byteLength, 3)` to avoid using an object in a boolean context. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: anthony-murphy <[email protected]>
- Fix race condition in catch handler by checking if cached promise is still the same before deleting (prevents concurrent request issues) - Replace flaky setTimeout with polling-based waitForCondition helper - Add test for fire-and-forget prefetch failure scenario - Improve assertion messages for clarity 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: anthony-murphy <[email protected]>
Use Promise.resolve() to flush the microtask queue for catch handlers, as setImmediate schedules a macrotask which is semantically incorrect for this purpose. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: anthony-murphy <[email protected]>
|
/azp run |
|
Commenter does not have sufficient privileges for PR 26151 in repo microsoft/FluidFramework |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes unhandled promise rejections in PrefetchDocumentStorageService that occurred when fire-and-forget prefetch requests failed. The fix changes how error handlers are attached to cached promises to prevent rejected promises from causing uncaughtException errors.
Key changes:
- Modified the error handling in
cachedRead()to attach.catch()for side effects only, without re-throwing errors - Added race condition protection to prevent clearing newer cached promises when older ones fail
- Added comprehensive unit tests covering error propagation, retry behavior, prefetch caching, and fire-and-forget failure scenarios
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
packages/loader/driver-utils/src/prefetchDocumentStorageService.ts |
Fixed unhandled rejection issue by storing original promise in cache and attaching non-throwing catch handler for cleanup; added race condition protection |
packages/loader/driver-utils/src/test/prefetchDocumentStorageService.spec.ts |
Added comprehensive test suite with 4 tests covering error propagation, cache clearing on retryable errors, successful prefetch caching, and fire-and-forget failure handling |
|
/azp run |
|
You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list. |
|
/azp run Build - client packages |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Build - client packages, Build - api-markdown-documenter, Build - benchmark-tool, Build - build-common, Build - build-tools, Build - common-utils, Build - eslint-config-fluid, Build - eslint-plugin-fluid, Build - protocol-definitions, Build - test-tools, repo-policy-check, server-routerlicious, server-gitrest, server-historian, server-gitssh |
|
You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list. |
|
/azp run Build - client packages, Build - api-markdown-documenter, Build - benchmark-tool, Build - build-common, Build - build-tools, Build - common-utils, Build - eslint-config-fluid, Build - eslint-plugin-fluid |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Build - protocol-definitions, Build - test-tools, repo-policy-check, server-routerlicious, server-gitrest, server-historian, server-gitssh |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
fixes AB#47951 |
|
|
||
| it("should clear cache on retryable errors allowing retry", async () => { | ||
| const retryableError = new Error("Retryable error"); | ||
| (retryableError as any).canRetry = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should avoid any and especially as any. This suggests that Errors used here do not match errors that will actually be used. Can be a maintenance issue for the future.
| // Now verify that explicit readBlob calls still receive the error properly | ||
| mockStorage.readBlobCalls = []; | ||
| await assert.rejects( | ||
| async () => prefetchService.readBlob("blob1"), | ||
| (error: Error) => error.message === "Prefetch network failure", | ||
| "Explicit readBlob should still receive the error", | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This portion of the testing is a little sketchy - at the very least unclear and can easily be misinterpreted.
mockStorage.readBlobCalls = []; does nothing. There is nothing that examines mockStorage.readBlobCalls after it is changed.
In this place, there should be nothing cached. The cached item should have been cleared. So here it would be fine to change storage to respond differently. There isn't a test to cover non-retriable errors. That should be added. A better test here could be to make this error non-retriable and to check that the error is the same error.
| if (prefetchedBlobP !== undefined) { | ||
| return prefetchedBlobP; | ||
| } | ||
| const prefetchedBlobPFromStorage = this.internalStorageService.readBlob(blobId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to skip the readBlob call if the blobId is already in prefetchCache?
Side note - you might recall that back in the day I wrote a utility for this (that correctly chains catch as you're doing in this PR) called PromiseCache. Could be useful to avoid these kinds of bugs.
## Summary - Adds explicit return type to `policies` getter in test mock to satisfy `@typescript-eslint/explicit-function-return-type` rule ## Root cause Race condition between two PRs: | Event | Time (UTC) | |-------|------------| | PR #26151 created (prefetch fix) | Jan 7, 23:12 | | PR #26149 merged (ESLint rule promotion) | Jan 8, 01:45 | | PR #26151 merged | Jan 8, 18:38 | PR #26151's CI ran before the ESLint rule was promoted in #26149. By the time #26151 was merged, the stricter linting rule was already in place, but CI wasn't re-run against the updated main branch. ## Test plan - [x] ESLint passes locally 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: anthony-murphy-agent <[email protected]> Co-authored-by: anthony-murphy <[email protected]> Co-authored-by: Claude Opus 4.5 <[email protected]>
…fetch When getSnapshotTree() fails (e.g., network timeout), the fire-and-forget prefetch pattern `void p.then(...)` creates a derived promise that also rejects. Without a .catch() handler, this causes uncaughtException errors. This is a follow-up to PR microsoft#26151 which fixed the same issue in cachedRead() but missed this code path in getSnapshotTree(). The fix adds .catch() to the derived promise chain to prevent unhandled rejections. The original promise `p` is still returned to callers, so they receive the error properly. Co-Authored-By: Claude Opus 4.5 <[email protected]> Co-Authored-By: anthony-murphy <[email protected]>
…orget calls
Fix three fire-and-forget patterns that cause unhandled rejections:
1. getSnapshotTree(): `void p.then(...)` creates a derived promise that
rejects when p rejects. Added `.catch()` to the promise chain.
2. prefetchTree()/prefetchTreeCore(): `void this.cachedRead(blob)` discards
the async function's returned promise. Since cachedRead is async, it
returns a separate promise chain from the inner prefetchedBlobPFromStorage.
The existing `.catch()` inside cachedRead only handles the inner promise,
not the async function's returned promise. Changed to
`this.cachedRead(blob).catch(() => {})`.
The root cause: attaching `.catch()` to a promise P only prevents unhandled
rejection for P's chain. When an async function returns P, the function's
returned promise is a DIFFERENT chain that also rejects - and needs its own
`.catch()` handler.
This is a follow-up to PR microsoft#26151 which only fixed the inner promise in
cachedRead but didn't fix the callers' fire-and-forget patterns.
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: anthony-murphy <[email protected]>
…fetch (#26173) ## Summary - Fix unhandled promise rejections in `PrefetchDocumentStorageService` fire-and-forget calls - Add unit test for getSnapshotTree failure scenario ## Problem This is a follow-up to PR #26151. After that fix was deployed, telemetry showed 349 `uncaughtException` errors from `ShreddedSummaryDocumentStorageService.readBlob` in test-service-load tinylicious tests. **Root cause:** PR #26151 attached `.catch()` to the inner promise in `cachedRead()`, but that only handles ONE promise chain. The fire-and-forget callers create a DIFFERENT chain: 1. **`void this.cachedRead(blob)`** - `cachedRead` is async, so it returns a NEW promise wrapping the inner promise. The `.catch()` on the inner promise doesn't prevent unhandled rejection on the async function's returned promise. 2. **`void p.then(...)`** in `getSnapshotTree()` - creates a derived promise that also rejects when `p` rejects. ## Solution Fix all three fire-and-forget patterns: ```typescript // Before (inner promise handled, but async wrapper not): void this.cachedRead(blob); // After (handles the async function's returned promise): this.cachedRead(blob).catch(() => {}); ``` ```typescript // Before (derived promise not handled): void p.then((tree) => this.prefetchTree(tree)); // After (handles the derived promise): void p.then(...).catch(() => {}); ``` ## Test plan - [x] New unit test for `getSnapshotTree` failure scenario - [x] All 5 PrefetchDocumentStorageService tests pass - [x] Build passes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: anthony-murphy <[email protected]> --------- Co-authored-by: anthony-murphy-agent <[email protected]> Co-authored-by: Claude Opus 4.5 <[email protected]> Co-authored-by: anthony-murphy <[email protected]>
Summary
PrefetchDocumentStorageServicewhen parallel prefetch requests failProblem
The
PrefetchDocumentStorageServicefires prefetch requests using fire-and-forget pattern (void this.cachedRead(blob)). When these requests failed, the.catch()handler was re-throwing errors, creating rejected promises stored in cache that no one awaited - causinguncaughtExceptionerrors.Evidence from telemetry showed 98
uncaughtExceptionerrors from a single process within ~70ms, all originating fromreadBlobvia prefetch.Solution
Store the original promise in the cache and attach
.catch()for side effects only (cache cleanup on retryable errors). Do not re-throw - callers who await the cached promise still receive the rejection properly.Also added a race-condition fix to avoid clearing a newer cached promise when an older one fails.
Test plan
PrefetchDocumentStorageService(4 tests covering error propagation, retry behavior, prefetch caching, and fire-and-forget failure handling)🤖 Generated with Claude Code
Co-Authored-By: anthony-murphy [email protected]
fixes AB#47951