fix: request aborted shoudn't result in a fetch error by SkArchon · Pull Request #2741 · wundergraph/cosmo

SkArchon · 2026-04-06T15:25:57Z

This PR fixes places where we mark spans as errors for client disconnects.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Client disconnections are no longer incorrectly marked as server errors in traces and metrics, improving observability and preventing inflated error counts.
- Error handling for client disconnects now properly distinguishes them from actual server errors across telemetry collection.
Tests
- Added comprehensive test coverage for client disconnect scenarios across telemetry and error handling flows.

Checklist

I have discussed my proposed changes in an issue and have received approval to proceed.
I have followed the coding standards of the project.
Tests or benchmarks have been added or updated.
Documentation has been updated on https://github.com/wundergraph/docs-website.
I have read the Contributors Guide.

Open Source AI Manifesto

This project follows the principles of the Open Source AI Manifesto. Please ensure your contribution aligns with its principles.

coderabbitai · 2026-04-06T15:26:19Z

Walkthrough

This pull request modifies client disconnection error handling across the router's request lifecycle. Changes ensure that context.Canceled errors are recorded as observability events on spans but do not mark spans as ERROR, do not count as request errors in metrics, and prevent HTTP 500 response writes. The changes span error handling, span/metric recording, transport layer, and comprehensive test coverage.

Changes

Cohort / File(s)	Summary
Core Error Handling `router/core/errors.go`, `router/core/batch.go`	Short-circuits `writeOperationError` for `context.Canceled`; conditionally records cancellation errors on spans without marking them as ERROR in batch processing.
Request Instrumentation `router/core/engine_loader_hooks.go`	Refactors fetch error handling into new `recordFetchError` helper; treats `context.Canceled` as non-error for tracing status while still recording events; always executes request count/latency metrics after error handling block.
Operation & Transport Handlers `router/core/graphql_prehandler.go`, `router/pkg/trace/transport.go`	Updates operation handler to record `context.Canceled` errors on spans without setting ERROR status; sets transport span status to `codes.Ok` for client disconnections.
Engine Loader Instrumentation Tests `router/core/engine_loader_hooks_test.go`	New test file with 427 lines covering `OnFinished` and `recordFetchError` behavior; validates non-ERROR span status for cancellations, error event recording, metric invocation patterns, and downstream error code aggregation.
Metrics Recording Tests `router/core/operation_metrics_test.go`	Updates `spyMetricStore` to capture attribute slices passed to `MeasureRequestError` for verification.
Transport Layer Tests `router/pkg/trace/transport_test.go`	Adds test case verifying `context.Canceled` errors produce `codes.Ok` span status rather than `codes.Error` in HTTP transport layer.
End-to-End Telemetry Tests `router-tests/telemetry/span_error_status_test.go`	Adds three integration subtests under `TestClientDisconnectionBehavior` validating non-ERROR spans during client disconnects, persisted-operation fetch timeouts, and batched request disconnects; verifies exception events are recorded without `500` responses.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

fix: documenting and updating span / metric error cases #2681: Addresses identical code-level pattern of treating context.Canceled as recorded-but-not-marked-as-ERROR across router core, transport, and telemetry surfaces.
fix: flaky router tests for error extensions #2554: Modifies router/core/engine_loader_hooks.go's downstream error code extraction logic during fetch error recording.
feat: http status code not being correct for metrics #2516: Updates router/core/graphql_prehandler.go's error and response status handling to capture actual HTTP status for metrics and tracing purposes.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main fix: preventing client disconnects from marking spans as fetch errors.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-06T15:28:12Z

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-503d8ea4aa64c63036488db9446059b8031f2c29-nonroot

codecov · 2026-04-06T15:30:46Z

Codecov Report

❌ Patch coverage is 88.33333% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.77%. Comparing base (65e05e3) to head (1e7948c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
router/core/batch.go	20.00%	2 Missing and 2 partials ⚠️
router/core/engine_loader_hooks.go	95.12%	1 Missing and 1 partial ⚠️
router/core/graphql_prehandler.go	88.88%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2741      +/-   ##
==========================================
- Coverage   63.46%   57.77%   -5.69%     
==========================================
  Files         251      235      -16     
  Lines       26767    26204     -563     
==========================================
- Hits        16987    15140    -1847     
- Misses       8414     9603    +1189     
- Partials     1366     1461      +95

Files with missing lines	Coverage Δ
router/core/errors.go	`79.90% <100.00%> (+0.19%)`	⬆️
router/pkg/trace/transport.go	`100.00% <100.00%> (ø)`
router/core/graphql_prehandler.go	`84.63% <88.88%> (-0.06%)`	⬇️
router/core/engine_loader_hooks.go	`91.01% <95.12%> (-0.96%)`	⬇️
router/core/batch.go	`75.59% <20.00%> (-7.34%)`	⬇️

... and 118 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

router/core/batch.go (1)

183-212: ⚠️ Potential issue | 🟠 Major

Return immediately for canceled batch requests.

This branch still falls through to writeRequestErrors, so a batched disconnect will try to serialize a GraphQL error after we've already classified the failure as a client-side cancellation. That can reintroduce write-side noise and mutate the observed status in the batch path.

💡 Proposed fix

func processBatchError(w http.ResponseWriter, r *http.Request, err error, requestLogger *zap.Logger) {
-	if errors.Is(err, context.Canceled) {
-		span := trace.SpanFromContext(r.Context())
-		span.RecordError(err)
-	} else {
-		ctrace.AttachErrToSpanFromContext(r.Context(), err)
-	}
+	if errors.Is(err, context.Canceled) {
+		trace.SpanFromContext(r.Context()).RecordError(err)
+		return
+	}
+
+	ctrace.AttachErrToSpanFromContext(r.Context(), err)

	requestError := graphqlerrors.RequestError{

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@router/core/batch.go` around lines 183 - 212, In processBatchError, the
context.Canceled branch records the span but then falls through to
writeRequestErrors; modify processBatchError so that when errors.Is(err,
context.Canceled) is true you RecordError on the span (as now) and then
immediately return to avoid calling writeRequestErrors and serializing a GraphQL
error for client cancellations; keep the existing else branch
(ctrace.AttachErrToSpanFromContext) and the subsequent handling for non-canceled
errors intact.

🧹 Nitpick comments (2)

router/core/engine_loader_hooks_test.go (2)

373-392: Assert the merged attrs at the metric-store boundary too.

Right now this only checks the returned slice. The test would still pass if recordFetchError returned the merged attrs but called MeasureRequestError with the old slice.

Suggested assertion

 		resultSlice, _ := hooks.recordFetchError(ctx, span, fetchErr, rc, nil, metricAddOpt, prePopulated)
 		span.End()
@@
 		require.True(t, hasExisting, "pre-populated attrs should be preserved")
 		require.True(t, hasErrorCodes, "error codes should be appended")
+
+		require.True(t, store.requestErrorCalled, "MeasureRequestError should be called")
+
+		var metricHasExisting, metricHasErrorCodes bool
+		for _, attr := range store.requestErrorSliceAttr {
+			if string(attr.Key) == "existing.attr" {
+				metricHasExisting = true
+			}
+			if string(attr.Key) == "graphql.error.codes" {
+				metricHasErrorCodes = true
+			}
+		}
+		require.True(t, metricHasExisting,
+			"MeasureRequestError should receive the pre-populated attrs")
+		require.True(t, metricHasErrorCodes,
+			"MeasureRequestError should receive the appended error codes")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@router/core/engine_loader_hooks_test.go` around lines 373 - 392, The test
currently only asserts the returned slice from hooks.recordFetchError contains
merged attributes; extend it to also verify that the metric-store call received
the merged slice: capture the arguments passed to the mock
MetricStore.MeasureRequestError (or the equivalent MeasureRequestError spy used
by hooks), and assert that the captured attrs include both the prePopulated
attribute ("existing.attr") and the "graphql.error.codes" attribute (same checks
used for resultSlice). Update references to the mock/spy used by the hooks setup
(e.g., the MetricStore mock instance) and reuse
metricAddOpt/prePopulated/resultSlice names to ensure MeasureRequestError was
invoked with the merged attrs.

107-130: Assert the exception event for wrapped cancellations too.

This case currently only proves the status/metric behavior. If the wrapped context.Canceled path stops emitting the observability event, the regression would still pass.

Suggested assertion

 		require.NotEqual(t, codes.Error, spans[0].Status().Code,
 			"wrapped context.Canceled should not set span status to Error")
+		require.Len(t, spans[0].Events(), 1,
+			"wrapped context.Canceled should still be recorded as a span event")
+		require.Equal(t, "exception", spans[0].Events()[0].Name)
 		require.False(t, store.requestErrorCalled,
 			"MeasureRequestError should not be called for wrapped context.Canceled")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@router/core/engine_loader_hooks_test.go` around lines 107 - 130, The test
"wrapped context.Canceled does not set span ERROR status" currently omits
asserting that an observability exception event is still emitted for wrapped
cancellations; update the test after calling hooks.OnFinished(ctx, ds,
&resolve.ResponseInfo{Err: wrappedErr}) to inspect the recorded span (spans[0])
events and assert that there is an "exception" event (or an event whose
attributes indicate an exception/exception.type/exception.message matching
wrappedErr) so the wrapped context.Canceled path still emits the expected
exception event; use the existing exporter.GetSpans().Snapshots() and
spans[0].Events() APIs to locate and assert the event.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@router-tests/telemetry/span_error_status_test.go`:
- Around line 188-191: Replace the fixed time.Sleep usage with
require.Eventually to avoid races when waiting for async span/log export:
instead of sleeping then calling exporter.GetSpans().Snapshots() and
require.NotEmpty(t, spans), call require.Eventually with a short polling
interval and timeout and inside the poll invoke exporter.GetSpans().Snapshots()
and assert len(spans) > 0 (or use require.NotEmpty within the closure) so the
test waits until all expected spans are exported; update the same pattern found
around the other occurrences that use time.Sleep before checking
exporter.GetSpans().Snapshots() (the instances at the other mentioned blocks
should be changed similarly).

In `@router/core/engine_loader_hooks.go`:
- Around line 274-331: The recordFetchError helper currently records the error
and metrics but never sets the span attribute that marks non-canceled request
failures; update recordFetchError to set the span attribute
rotel.WgRequestError.Bool(true) for real fetch failures (i.e., when fetchErr is
not context.Canceled and not context.DeadlineExceeded or equivalent cancellation
checks) before returning so traces and metrics stay in sync; reference the
recordFetchError function and rotel.WgRequestError to locate where to add
span.SetAttributes(...) alongside the existing metricAttrs append and
otelmetric.WithAttributeSet creation.

---

Outside diff comments:
In `@router/core/batch.go`:
- Around line 183-212: In processBatchError, the context.Canceled branch records
the span but then falls through to writeRequestErrors; modify processBatchError
so that when errors.Is(err, context.Canceled) is true you RecordError on the
span (as now) and then immediately return to avoid calling writeRequestErrors
and serializing a GraphQL error for client cancellations; keep the existing else
branch (ctrace.AttachErrToSpanFromContext) and the subsequent handling for
non-canceled errors intact.

---

Nitpick comments:
In `@router/core/engine_loader_hooks_test.go`:
- Around line 373-392: The test currently only asserts the returned slice from
hooks.recordFetchError contains merged attributes; extend it to also verify that
the metric-store call received the merged slice: capture the arguments passed to
the mock MetricStore.MeasureRequestError (or the equivalent MeasureRequestError
spy used by hooks), and assert that the captured attrs include both the
prePopulated attribute ("existing.attr") and the "graphql.error.codes" attribute
(same checks used for resultSlice). Update references to the mock/spy used by
the hooks setup (e.g., the MetricStore mock instance) and reuse
metricAddOpt/prePopulated/resultSlice names to ensure MeasureRequestError was
invoked with the merged attrs.
- Around line 107-130: The test "wrapped context.Canceled does not set span
ERROR status" currently omits asserting that an observability exception event is
still emitted for wrapped cancellations; update the test after calling
hooks.OnFinished(ctx, ds, &resolve.ResponseInfo{Err: wrappedErr}) to inspect the
recorded span (spans[0]) events and assert that there is an "exception" event
(or an event whose attributes indicate an
exception/exception.type/exception.message matching wrappedErr) so the wrapped
context.Canceled path still emits the expected exception event; use the existing
exporter.GetSpans().Snapshots() and spans[0].Events() APIs to locate and assert
the event.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 95ba86f9-0380-4da8-8e23-5351e8b8d117

📥 Commits

Reviewing files that changed from the base of the PR and between 2326f9c and 1e7948c.

📒 Files selected for processing (9)

router-tests/telemetry/span_error_status_test.go
router/core/batch.go
router/core/engine_loader_hooks.go
router/core/engine_loader_hooks_test.go
router/core/errors.go
router/core/graphql_prehandler.go
router/core/operation_metrics_test.go
router/pkg/trace/transport.go
router/pkg/trace/transport_test.go

coderabbitai · 2026-04-06T15:40:07Z

router-tests/telemetry/span_error_status_test.go

+			time.Sleep(500 * time.Millisecond)
+
+			spans := exporter.GetSpans().Snapshots()
+			require.NotEmpty(t, spans)


⚠️ Potential issue | 🟡 Minor

Replace the new fixed sleeps with require.Eventually.

These assertions depend on async span/log export, so time.Sleep(500 * time.Millisecond) is still racy under slower CI and can miss late spans or log entries.

As per coding guidelines, "For periodic exporters, wait for ALL expected items using require.Eventually, not just one sentinel value, to avoid race conditions with export cycles".

Also applies to: 292-295, 361-363

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@router-tests/telemetry/span_error_status_test.go` around lines 188 - 191, Replace the fixed time.Sleep usage with require.Eventually to avoid races when waiting for async span/log export: instead of sleeping then calling exporter.GetSpans().Snapshots() and require.NotEmpty(t, spans), call require.Eventually with a short polling interval and timeout and inside the poll invoke exporter.GetSpans().Snapshots() and assert len(spans) > 0 (or use require.NotEmpty within the closure) so the test waits until all expected spans are exported; update the same pattern found around the other occurrences that use time.Sleep before checking exporter.GetSpans().Snapshots() (the instances at the other mentioned blocks should be changed similarly).

coderabbitai · 2026-04-06T15:40:07Z

router/core/engine_loader_hooks.go

+func (f *engineLoaderHooks) recordFetchError(
+	ctx context.Context,
+	span trace.Span,
+	fetchErr error,
+	reqContext *requestContext,
+	metricAttrs []attribute.KeyValue,
+	metricAddOpt otelmetric.AddOption,
+	metricSliceAttrs []attribute.KeyValue,
+) ([]attribute.KeyValue, otelmetric.MeasurementOption) {
+	rtrace.SetSanitizedSpanStatus(span, codes.Error, fetchErr.Error())
+	span.RecordError(fetchErr)
+
+	// Extract downstream error codes from subgraph errors
+	var errorCodesAttr []string
+
+	if unwrapped, ok := fetchErr.(multiError); ok {
+		for _, e := range unwrapped.Unwrap() {
+			var subgraphError *resolve.SubgraphError
+			if !errors.As(e, &subgraphError) {
+				continue
+			}
+
+			for i, downstreamError := range subgraphError.DownstreamErrors {
+				var errorCode string
+				if downstreamError.Extensions != nil {
+					if value := downstreamError.Extensions.Get("code"); value != nil {
+						errorCode = string(value.GetStringBytes())
 					}
 				}
+
+				if errorCode == "" {
+					continue
+				}
+
+				errorCodesAttr = append(errorCodesAttr, errorCode)
+				span.AddEvent(fmt.Sprintf("Downstream error %d", i+1),
+					trace.WithAttributes(
+						rotel.WgSubgraphErrorExtendedCode.String(errorCode),
+						rotel.WgSubgraphErrorMessage.String(downstreamError.Message),
+					),
+				)
 			}
 		}

 		errorCodesAttr = unique.SliceElements(errorCodesAttr)
-		// Reduce cardinality of error codes
 		slices.Sort(errorCodesAttr)
+	}

-		metricSliceAttrs := *reqContext.telemetry.AcquireAttributes()
-		defer reqContext.telemetry.ReleaseAttributes(&metricSliceAttrs)
-		metricSliceAttrs = append(metricSliceAttrs, reqContext.telemetry.metricSliceAttrs...)
-
-		// We can't add this earlier because this is done per subgraph response
-		if v, ok := reqContext.telemetry.metricSetAttrs[ContextFieldGraphQLErrorCodes]; ok && len(errorCodesAttr) > 0 {
-			metricSliceAttrs = append(metricSliceAttrs, attribute.StringSlice(v, errorCodesAttr))
-		}
-
-		f.metricStore.MeasureRequestError(ctx, metricSliceAttrs, metricAddOpt)
+	if v, ok := reqContext.telemetry.metricSetAttrs[ContextFieldGraphQLErrorCodes]; ok && len(errorCodesAttr) > 0 {
+		metricSliceAttrs = append(metricSliceAttrs, attribute.StringSlice(v, errorCodesAttr))
+	}

-		metricAttrs = append(metricAttrs, rotel.WgRequestError.Bool(true))
+	f.metricStore.MeasureRequestError(ctx, metricSliceAttrs, metricAddOpt)

-		attrOpt := otelmetric.WithAttributeSet(attribute.NewSet(metricAttrs...))
-		f.metricStore.MeasureRequestCount(ctx, metricSliceAttrs, attrOpt)
-		f.metricStore.MeasureLatency(ctx, latency, metricSliceAttrs, attrOpt)
-	} else {
-		f.metricStore.MeasureRequestCount(ctx, reqContext.telemetry.metricSliceAttrs, metricAddOpt)
-		f.metricStore.MeasureLatency(ctx, latency, reqContext.telemetry.metricSliceAttrs, metricAddOpt)
-	}
+	metricAttrs = append(metricAttrs, rotel.WgRequestError.Bool(true))
+	attrOpt := otelmetric.WithAttributeSet(attribute.NewSet(metricAttrs...))

-	span.SetAttributes(traceAttrs...)
+	return metricSliceAttrs, attrOpt


⚠️ Potential issue | 🟠 Major

Restore wg.request.error=true on real fetch failures.

This helper replaced the previous rtrace.AttachErrToSpan path, but it no longer sets the span attribute that marks non-canceled request failures. The metric attrs still get wg.request.error=true, so traces and metrics will drift apart for the same fetch error.

💡 Proposed fix

func (f *engineLoaderHooks) recordFetchError( ctx context.Context, span trace.Span, fetchErr error, reqContext *requestContext, metricAttrs []attribute.KeyValue, metricAddOpt otelmetric.AddOption, metricSliceAttrs []attribute.KeyValue, ) ([]attribute.KeyValue, otelmetric.MeasurementOption) { rtrace.SetSanitizedSpanStatus(span, codes.Error, fetchErr.Error()) + span.SetAttributes(rotel.WgRequestError.Bool(true)) span.RecordError(fetchErr)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (f *engineLoaderHooks) recordFetchError(

ctx context.Context,

span trace.Span,

fetchErr error,

reqContext *requestContext,

metricAttrs []attribute.KeyValue,

metricAddOpt otelmetric.AddOption,

metricSliceAttrs []attribute.KeyValue,

) ([]attribute.KeyValue, otelmetric.MeasurementOption) {

rtrace.SetSanitizedSpanStatus(span, codes.Error, fetchErr.Error())

span.RecordError(fetchErr)

// Extract downstream error codes from subgraph errors

var errorCodesAttr []string

if unwrapped, ok := fetchErr.(multiError); ok {

for _, e := range unwrapped.Unwrap() {

var subgraphError *resolve.SubgraphError

if !errors.As(e, &subgraphError) {

continue

}

for i, downstreamError := range subgraphError.DownstreamErrors {

var errorCode string

if downstreamError.Extensions != nil {

if value := downstreamError.Extensions.Get("code"); value != nil {

errorCode = string(value.GetStringBytes())

}

}

if errorCode == "" {

continue

}

errorCodesAttr = append(errorCodesAttr, errorCode)

span.AddEvent(fmt.Sprintf("Downstream error %d", i+1),

trace.WithAttributes(

rotel.WgSubgraphErrorExtendedCode.String(errorCode),

rotel.WgSubgraphErrorMessage.String(downstreamError.Message),

),

)

}

}

errorCodesAttr = unique.SliceElements(errorCodesAttr)

// Reduce cardinality of error codes

slices.Sort(errorCodesAttr)

}

metricSliceAttrs := *reqContext.telemetry.AcquireAttributes()

defer reqContext.telemetry.ReleaseAttributes(&metricSliceAttrs)

metricSliceAttrs = append(metricSliceAttrs, reqContext.telemetry.metricSliceAttrs...)

// We can't add this earlier because this is done per subgraph response

if v, ok := reqContext.telemetry.metricSetAttrs[ContextFieldGraphQLErrorCodes]; ok && len(errorCodesAttr) > 0 {

metricSliceAttrs = append(metricSliceAttrs, attribute.StringSlice(v, errorCodesAttr))

}

f.metricStore.MeasureRequestError(ctx, metricSliceAttrs, metricAddOpt)

if v, ok := reqContext.telemetry.metricSetAttrs[ContextFieldGraphQLErrorCodes]; ok && len(errorCodesAttr) > 0 {

metricSliceAttrs = append(metricSliceAttrs, attribute.StringSlice(v, errorCodesAttr))

}

metricAttrs = append(metricAttrs, rotel.WgRequestError.Bool(true))

f.metricStore.MeasureRequestError(ctx, metricSliceAttrs, metricAddOpt)

attrOpt := otelmetric.WithAttributeSet(attribute.NewSet(metricAttrs...))

f.metricStore.MeasureRequestCount(ctx, metricSliceAttrs, attrOpt)

f.metricStore.MeasureLatency(ctx, latency, metricSliceAttrs, attrOpt)

} else {

f.metricStore.MeasureRequestCount(ctx, reqContext.telemetry.metricSliceAttrs, metricAddOpt)

f.metricStore.MeasureLatency(ctx, latency, reqContext.telemetry.metricSliceAttrs, metricAddOpt)

}

metricAttrs = append(metricAttrs, rotel.WgRequestError.Bool(true))

attrOpt := otelmetric.WithAttributeSet(attribute.NewSet(metricAttrs...))

span.SetAttributes(traceAttrs...)

return metricSliceAttrs, attrOpt

func (f *engineLoaderHooks) recordFetchError(

ctx context.Context,

span trace.Span,

fetchErr error,

reqContext *requestContext,

metricAttrs []attribute.KeyValue,

metricAddOpt otelmetric.AddOption,

metricSliceAttrs []attribute.KeyValue,

) ([]attribute.KeyValue, otelmetric.MeasurementOption) {

rtrace.SetSanitizedSpanStatus(span, codes.Error, fetchErr.Error())

span.SetAttributes(rotel.WgRequestError.Bool(true))

span.RecordError(fetchErr)

// Extract downstream error codes from subgraph errors

var errorCodesAttr []string

if unwrapped, ok := fetchErr.(multiError); ok {

for _, e := range unwrapped.Unwrap() {

var subgraphError *resolve.SubgraphError

if !errors.As(e, &subgraphError) {

continue

}

for i, downstreamError := range subgraphError.DownstreamErrors {

var errorCode string

if downstreamError.Extensions != nil {

if value := downstreamError.Extensions.Get("code"); value != nil {

errorCode = string(value.GetStringBytes())

}

}

if errorCode == "" {

continue

}

errorCodesAttr = append(errorCodesAttr, errorCode)

span.AddEvent(fmt.Sprintf("Downstream error %d", i+1),

trace.WithAttributes(

rotel.WgSubgraphErrorExtendedCode.String(errorCode),

rotel.WgSubgraphErrorMessage.String(downstreamError.Message),

),

)

}

}

errorCodesAttr = unique.SliceElements(errorCodesAttr)

slices.Sort(errorCodesAttr)

}

if v, ok := reqContext.telemetry.metricSetAttrs[ContextFieldGraphQLErrorCodes]; ok && len(errorCodesAttr) > 0 {

metricSliceAttrs = append(metricSliceAttrs, attribute.StringSlice(v, errorCodesAttr))

}

f.metricStore.MeasureRequestError(ctx, metricSliceAttrs, metricAddOpt)

metricAttrs = append(metricAttrs, rotel.WgRequestError.Bool(true))

attrOpt := otelmetric.WithAttributeSet(attribute.NewSet(metricAttrs...))

return metricSliceAttrs, attrOpt

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@router/core/engine_loader_hooks.go` around lines 274 - 331, The recordFetchError helper currently records the error and metrics but never sets the span attribute that marks non-canceled request failures; update recordFetchError to set the span attribute rotel.WgRequestError.Bool(true) for real fetch failures (i.e., when fetchErr is not context.Canceled and not context.DeadlineExceeded or equivalent cancellation checks) before returning so traces and metrics stay in sync; reference the recordFetchError function and rotel.WgRequestError to locate where to add span.SetAttributes(...) alongside the existing metricAttrs append and otelmetric.WithAttributeSet creation.

fix: request aborted shoudn't result in a fetch error

1e7948c

github-actions bot added the router label Apr 6, 2026

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: request aborted shoudn't result in a fetch error#2741

fix: request aborted shoudn't result in a fetch error#2741
SkArchon wants to merge 1 commit intomainfrom
milinda/eng-8828-router-request-aborted-shoudnt-result-in-a-fetch-error

SkArchon commented Apr 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 6, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 6, 2026

Uh oh!

coderabbitai bot Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SkArchon commented Apr 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Checklist

Open Source AI Manifesto

Uh oh!

coderabbitai bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Router-nonroot image scan passed

Uh oh!

codecov bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SkArchon commented Apr 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 6, 2026 •

edited

Loading

github-actions bot commented Apr 6, 2026 •

edited

Loading

codecov bot commented Apr 6, 2026 •

edited

Loading