Skip to content

Updating @graphql-mesh/graphql from 0.98.5 to 0.104.6 causing CloudFlare workers I/O error #8691

@liamdohertybjss

Description

@liamdohertybjss

Issue workflow progress

Progress of the issue based on the
Contributor Workflow

Make sure to fork this template and run yarn generate in the terminal.

Please make sure Mesh package versions under package.json matches yours.

  • 2. A failing test has been provided
  • 3. A local solution has been provided
  • 4. A pull request is pending review

Describe the bug

When upgrading the @graphql-mesh/graphql package from version 0.98.5 to 0.104.6 and deploying to a CloudFlare worker the following error occurs.

      "errors": Array [
         Object {
           "extensions": Object {
             "originalError": Object {
               "message": "Cannot perform I/O on behalf of a different request. I/O objects (such as streams, request/response bodies, and others) created in the context of one request handler cannot be accessed from a different request's handler. This is a limitation of Cloudflare Workers which allows us to improve overall performance. (I/O type: RefcountedCanceler)",
               "name": "Error",
               "stack": "Error: Cannot perform I/O on behalf of a different request. I/O objects (such as streams, request/response bodies, and others) created in the context of one request handler cannot be accessed from a different request's handler. This is a limitation of Cloudflare Workers which allows us to improve overall performance. (I/O type: RefcountedCanceler)
         at baseExecutor (index.js:82500:28)
         at index.js:83574:101
         at Promise.then (index.js:12222:30)
         at handleMaybePromise (index.js:12200:50)
         at index.js:83574:31
         at meshExecutor (index.js:90040:18)
         at executorWithSourceName (index.js:90019:18)
         at index.js:23723:39
         at Promise.then (index.js:12222:30)
         at DataLoader4.batchExecuteLoadFn [as _batchLoadFn] (index.js:23723:28)",
              },
            },
           "message": "Cannot perform I/O on behalf of a different request. I/O objects (such as streams, request/response bodies, and others) created in the context of one request handler cannot be accessed from a different request's handler. This is a limitation of Cloudflare Workers which allows us to improve overall performance. (I/O type: RefcountedCanceler)",
           "path": Array [
             "validateContactInformation",
           ],
          },
       ],

No other changes have been made to the code base apart from this one update.
The same issue happens locally using wrangler as well when deployed to CloudFlare

To Reproduce Steps to reproduce the behavior:

  • npm install @graphql-mesh/graphql@latest
  • Deploy to CloudFlare workers or
  • Run worker locally
  • First request runs as usual but all subsequent requests show the above error

Expected behavior

  • After updating, the code still runs on CloudFlare workers without error

Environment:

  • CloudFlare workers

Additional context

Notes from an investigation carried out to try identify the issue. The investigation was inconclusive.

Updating @graphql-mesh/graphql
From: 0.98.5

To: 0.104.6

Result: Component tests fail with I/O error

Summary of node_modules changes:

Removed node modules
extract-files

rehackt

@ardatan/sync-fetch

New node modules
@graphql-hive/signal

fetch-blob

formdata-polyfill

node-domexception

sync-fetch

timeout-signal

web-streams-polyfill

@graphql-tools/executor-common

Changed node modules
@graphql-mesh/graphql

@graphql-tools/batch-delegate

@graphql-tools/batch-execute

@graphql-tools/delegate

@graphql-tools/exectutor-graphql-ws

@graphql-tools/executor-http

@graphq-tools/exectutor-legacy-ws

@graphql-tools/federation

@graphql-tools/stitch

@graphql-tools/url-loader

@graphql-tools/wrap

@graphql-yoga/typed-event

@whatwg-node/events

@whatwg-node/promise-helpers

meros

Runtime differences
When updating there is a difference in how the @whatwg-node/server library handles this request. Although, this library has not changed after the update conditional logic in the library results in a different path being taken at run time due to differences in the server context.

This code block is from @whatwg-node/server/esm/createServerAdaptor.js

function isRequestAccessible(serverContext) {
    try {
        return !!serverContext?.request;
    }
    catch {
        return false;
    }
}

Before updating @graphql-mesh/graphql this code returns false as the serverContext object is itself a js Request object for the HTTP request that is being handled

After updating @graphql-mesh/graphql this code returns true as the serverContext object contains only the request key. The request is the JS Request object for the HTTP request that is being handled

The result of this function is used to direct flow in the genericRequestHandler function in @whatwg-node/server/esm/createServerAdaptor.js

 // Is input a container object over Request?
        if (isRequestAccessible(input)) {
            // Is it FetchEvent?
            if (isFetchEvent(input)) {
                return handleEvent(input, ...maybeCtx);
            }
            // In this input is also the context
            return handleRequestWithWaitUntil(input.request, input, ...maybeCtx);
        }
        // Or is it Request itself?
        // Then ctx is present and it is the context
        return fetchFn(input, ...maybeCtx);

Note isFetchEvent is always false. This means in the updated version that handleRequestWaitUntil is called but in the original version fetchFn is called.

This is the fetchFn

    const fetchFn = (input, ...maybeCtx) => {
        if (typeof input === 'string' || 'href' in input) {
            const [initOrCtx, ...restOfCtx] = maybeCtx;
            if (isRequestInit(initOrCtx)) {
                const request = new fetchAPI.Request(input, initOrCtx);
                const res$ = handleRequestWithWaitUntil(request, ...restOfCtx);
                return handleAbortSignalAndPromiseResponse(res$, initOrCtx?.signal);
            }
            const request = new fetchAPI.Request(input);
            return handleRequestWithWaitUntil(request, ...maybeCtx);
        }
        const res$ = handleRequestWithWaitUntil(input, ...maybeCtx);
        return handleAbortSignalAndPromiseResponse(res$, input._signal);
    };
``
The input is not a string or a href so `handleRequestWithWaitUntil` is called directly.

**Differences so far:**

Original version input is itself a Request object. New version input has a key called request which is a Request object

Original version in genericRequest handler isRequestAccessible returns false so `fetchFn` is called. New version `isRequestAccessible` is true so `handlRequestWithWaitUntil` is called directly with input.request as `handleRequestWithWaitUntil` requires a Request object as an argument.

Original version. `fetchFn` also calls `handleRequestWaitUnti`l but with input as in the original version input is the Request object

Original version. `handleAbortSignalAndPromiseresponse `is called additionally to `handleRequestWaitUntil`

 

handleRequestWaitUntil


function handleRequestWithWaitUntil(request, ...ctx) {
console.debug('request: ', JSON.stringify(request))
const filteredCtxParts = ctx.filter(partCtx => partCtx != null);
console.debug('filteredCtxParts: ', JSON.stringify(filteredCtxParts),'\n')
let waitUntilPromises;
const serverContext = filteredCtxParts.length > 1
? completeAssign({}, ...filteredCtxParts)
: isolateObject(filteredCtxParts[0], filteredCtxParts[0] == null || filteredCtxParts[0].waitUntil == null
? (waitUntilPromises = [])
: undefined);
console.debug('serverContext: ', JSON.stringify(serverContext),'\n')
const response$ = handleRequest(request, serverContext);
console.debug('response$: ', JSON.stringify(response$),'\n')
if (waitUntilPromises?.length) {
console.debug('waitUntilPromises: ', JSON.stringify(waitUntilPromises),'\n')
return handleWaitUntils(waitUntilPromises).then(() => response$);
}
return response$;
}

The console.debug statements have been added to step through the function. The output is the same in both the original updated repos. On both successful and failed requests on the new version.

 

**Remaining difference**

`fetchFn` calls `handleAbortSignalAndPromiseresponseafter` it has called` handleRequestWaitUntil`. Contents of this function are:


export function handleAbortSignalAndPromiseResponse(response$, abortSignal) {
if (isPromise(response$) && abortSignal) {
const deferred$ = createDeferredPromise();
abortSignal.addEventListener('abort', function abortSignalFetchErrorHandler() {
deferred$.reject(abortSignal.reason);
});
response$
.then(function fetchSuccessHandler(res) {
deferred$.resolve(res);
})
.catch(function fetchErrorHandler(err) {
deferred$.reject(err);
});
return deferred$.promise;
}
return response$;
}


This function waits for the response promise to resolve before returning the response - but only if it is a promise.

Logging the original version shows that all request responses are never promises. 

This means the only real difference between original and new version is the nested request object. 

 

**Deploying directly to CloudFlare**
 

To rule out the possibility of these issues only occurring locally the updated worker code was deployed directly the CloudFlare using wrangler deploy

 
Through logs the behaviour was observed to be very similar to what happens locally. Summarised as follows:

**Observed differences**

- Locally the error occurs on all requests after the first request

- Deployed to CloudFlare the same error does eventually occur but after approx 5 - 10 requests. Suggested theory to explain this behaviour is that more concurrent resources are available on CloudFlare which means it takes longer for a new request to hit the same worker thread/instance again.

**Observed similarities**

- The same code executes in the same order when deployed as it does locally

**Additional notable observations**

- When the I/O error occurs the downstream service does not receive an HTTP request. This means that the error occurs outside of the mesh code but before the request leaves CloudFlare - suggesting something is catching the error inside the CloudFlare workers run time

- The error is not a server error. The mesh error catching code is not run, the request runs correctly each time and returns a 200 status. It is just that the response payload contains the error message

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions