fix(otlp-exporter-base): ensure retry on network errors during HTTP export #6147

jsokol805 · 2025-11-21T10:41:10Z

Which problem is this PR solving?

OpenTelemetry OTLP/HTTP specification states:

If the server disconnects without returning a response, the client SHOULD retry and send the same request. The client SHOULD implement an exponential backoff strategy between retries to avoid overwhelming the server.
...
If the client cannot connect to the server, the client SHOULD retry the connection using an exponential backoff strategy between retries.  The interval between retries must have a random jitter.

The backoff infrastrucure was in place, it was just the glue code to request APIs (fetch, http, xhr) that was reporting non-retryable state in case of errors that might be temporary.

Short description of the changes

Added an utility function that categorizes if error from transport is plausibly a network one, then adjusted all 3 transports (fetch, http, XHR) to use it when handling errors.

Type of change

Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Unit tests for all functionalities

Checklist:

Followed the style guidelines of this project
Unit tests have been added

linux-foundation-easycla · 2025-11-21T10:41:16Z

The committers listed above are authorized under a signed CLA.

✅ login: jsokol805 / name: Jakub Sokół (9c5b5cf, b185034, c3f0aa2)

…xport OpenTelemetry OTLP/HTTP specification states: ``` If the server disconnects without returning a response, the client SHOULD retry and send the same request. The client SHOULD implement an exponential backoff strategy between retries to avoid overwhelming the server. ... If the client cannot connect to the server, the client SHOULD retry the connection using an exponential backoff strategy between retries. The interval between retries must have a random jitter. ``` The backoff infrastrucure was in place, it was just the glue code to request APIs (fetch, http, xhr) that was reporting non-retryable state in case of errors that might be temporary.

codecov · 2025-11-21T20:39:05Z

Codecov Report

❌ Patch coverage is 88.88889% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.37%. Comparing base (c071e6e) to head (29233f3).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
...xporter-base/src/transport/http-transport-utils.ts	60.00%	2 Missing ⚠️
...ages/otlp-exporter-base/src/is-export-retryable.ts	95.45%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6147      +/-   ##
==========================================
- Coverage   95.39%   95.37%   -0.03%     
==========================================
  Files         316      316              
  Lines        9374     9398      +24     
  Branches     2166     2175       +9     
==========================================
+ Hits         8942     8963      +21     
- Misses        432      435       +3

Files with missing lines	Coverage Δ
...ages/otlp-exporter-base/src/is-export-retryable.ts	`97.14% <95.45%> (-2.86%)`	⬇️
...xporter-base/src/transport/http-transport-utils.ts	`95.74% <60.00%> (-4.26%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pichlermarc

Hi - thanks for working on this! 🙂
This is going in the right direction - we just need to make sure that we don't swallow all errors now that most outcomes of the HTTP request are retryable, so that our end-users can still troubleshoot, if necessary.

pichlermarc · 2025-11-21T20:20:43Z

experimental/packages/otlp-exporter-base/src/transport/fetch-transport.ts

+      if (isExportNetworkErrorRetryable(error)) {
        return {
-          status: 'failure',
-          error: new Error('Fetch request timed out', { cause: error }),
+          status: 'retryable',
+          retryInMillis: 0,
        };
      }


this transport is fully intended for the browser. It's for browser and webworkers. It will therefore never receive any undici errors.

pichlermarc · 2025-11-21T20:29:41Z

experimental/packages/otlp-exporter-base/src/transport/http-transport-utils.ts

+      onDone({
+        status: 'retryable',
+        retryInMillis: 0,
+      });


This changes the behavoir quite significantly:

errors are completely swallowed. If this always fails, the end-user will never be able to see a log what actually went wrong.

suggestion: I think to solve this we should allow attaching an optional error to a retryable status so that the RetryingTransport can propagate these back to the export delegate, which then logs it.

retryInMillis: 0 might not be what we want. the RetryingTransport implements an exponential backoff as required by the spec. IMO we should have this backoff happen to avoid overwhelming the endpoint.

suggestion: I think we should omit retryInMillis here to let the retryingtransport handle it.

pichlermarc · 2025-11-21T20:30:27Z

experimental/packages/otlp-exporter-base/src/transport/xhr-transport.ts

-          status: 'failure',
-          error: new Error('XHR request timed out'),
+          status: 'retryable',
+          retryInMillis: 0,


this timeout here is also the full maximum of the request - if this is ever triggered, there's no time left for the export to retry.

pichlermarc · 2025-11-21T20:31:36Z

experimental/packages/otlp-exporter-base/src/transport/xhr-transport.ts

-          status: 'failure',
-          error: new Error('XHR request errored'),
+          status: 'retryable',
+          retryInMillis: 0,


suggestion: I would omit retryInMillis here to let the RetryingTransport have the exponential backoff.

pichlermarc · 2025-11-21T20:32:21Z

experimental/packages/otlp-exporter-base/src/is-export-retryable.ts

+    'UND_ERR_CONNECT_TIMEOUT',
+    'UND_ERR_HEADERS_TIMEOUT',
+    'UND_ERR_BODY_TIMEOUT',
+    'UND_ERR_SOCKET',


I don't think there's any code path right now that would produce undici errors.

pichlermarc · 2025-11-21T20:35:26Z

experimental/packages/otlp-exporter-base/test/browser/fetch-transport.test.ts

+    it('returns retryable when fetch throws network error with code', function (done) {
+      // arrange
+      const cause = new Error('network error') as NodeJS.ErrnoException;
+      cause.code = 'ECONNRESET';
+      const networkError = new TypeError('fetch failed', { cause });
+      sinon.stub(globalThis, 'fetch').rejects(networkError);
+      const transport = createFetchTransport(testTransportParameters);
+
+      //act
+      transport.send(testPayload, requestTimeout).then(response => {
+        // assert
+        try {
+          assert.strictEqual(response.status, 'retryable');
+          assert.strictEqual(
+            (response as ExportResponseRetryable).retryInMillis,
+            0
+          );
+        } catch (e) {
+          done(e);
+        }
+        done();
+      }, done /* catch any rejections */);
+    });
  });


no need to test any Node.js things here - this is a browser-targeted test for a browser-targeted component.

- Remove reference to undici errors - Add diagnostic verbose/info logs to we can better understand whats happening during e2e test - Fix how jitter gets applied (before it was adding 0.2 to timeout in miliseconds) - Add error reasons to retryeable errors; ensure that errors code gets passed

jsokol805 force-pushed the fix/otlp-network-errors-retryable branch 2 times, most recently from aba8e84 to 4722eeb Compare November 21, 2025 10:46

jsokol805 force-pushed the fix/otlp-network-errors-retryable branch from 4722eeb to 29233f3 Compare November 21, 2025 10:48

pichlermarc reviewed Nov 24, 2025

View reviewed changes

jsokol805 added 2 commits November 29, 2025 10:06

Test retry scenario during e2e test

9c5b5cf

jsokol805 requested a review from pichlermarc November 29, 2025 09:23

jsokol805 marked this pull request as ready for review November 29, 2025 09:24

Merge branch 'main' into fix/otlp-network-errors-retryable

c3f0aa2

jsokol805 requested a review from a team as a code owner November 29, 2025 09:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(otlp-exporter-base): ensure retry on network errors during HTTP export #6147

fix(otlp-exporter-base): ensure retry on network errors during HTTP export #6147

Uh oh!

jsokol805 commented Nov 21, 2025

Uh oh!

linux-foundation-easycla bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

pichlermarc left a comment

Uh oh!

pichlermarc Nov 21, 2025

Uh oh!

pichlermarc Nov 21, 2025

Uh oh!

pichlermarc Nov 21, 2025

Uh oh!

pichlermarc Nov 21, 2025

Uh oh!

pichlermarc Nov 21, 2025

Uh oh!

pichlermarc Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(otlp-exporter-base): ensure retry on network errors during HTTP export #6147

Are you sure you want to change the base?

fix(otlp-exporter-base): ensure retry on network errors during HTTP export #6147

Uh oh!

Conversation

jsokol805 commented Nov 21, 2025

Which problem is this PR solving?

Short description of the changes

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

linux-foundation-easycla bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pichlermarc left a comment

Choose a reason for hiding this comment

Uh oh!

pichlermarc Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

pichlermarc Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

pichlermarc Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

pichlermarc Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

pichlermarc Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

pichlermarc Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

linux-foundation-easycla bot commented Nov 21, 2025 •

edited

Loading

codecov bot commented Nov 21, 2025 •

edited

Loading