Skip to content

Client errors (disconnection, auth failures) incorrectly reported as server errors in OTEL #38756

@getvictor

Description

@getvictor

Fleet version:
4.79


💥 Actual behavior

In our Dogfood production environment, client errors are being reported to OTEL/APM/Sentry as server errors. This includes:

  • TCP read timeouts - client disconnected or didn't send data in time
  • Authentication failures (*http.AuthRequiredError, *http.AuthHeaderRequiredError) - missing/invalid credentials
  • *service.OsqueryError with context canceled - client disconnected mid-request

These are client-side issues (4xx errors) but are polluting error dashboards and triggering false investigations as if they were server errors (5xx).

🛠️ To fix

Follow OTEL semantic conventions for HTTP spans:

  • Per OTEL spec: "For HTTP status codes in the 4xx range, span status MUST be left unset in case of SpanKind.SERVER"
  • Only set span status to Error for server errors (5xx)
  • Skip sending client errors to APM/Sentry
  • Add separate OTEL metrics for client errors (fleet.http.client_errors) vs server errors (fleet.http.server_errors)

🧑‍💻 Steps to reproduce

These steps:

  • Have been confirmed to consistently lead to reproduction in multiple Fleet instances.
  1. Enable OTEL tracing in Fleet (FLEET_LOGGING_TRACING_ENABLED=true and FLEET_LOGGING_TRACING_TYPE=opentelemetry)
  2. Send a request with invalid/missing authentication
  3. Observe that the error is reported to OTEL with span status set to Error
  4. Alternatively, open a connection to Fleet and disconnect before completing the request (TCP timeout)
  5. Observe the timeout error is reported as a server error in OTEL

🕯️ More info (optional)

QA

Since OTEL is not productized, it does not need to be QA'ed. We will monitor our Dogfood OTEL to make sure these issues are resolved.

Metadata

Metadata

Assignees

Labels

#g-security-complianceSecurity & Compliance product groupbugSomething isn't working as documented~dogfoodIssue resulted from Fleet's product dogfooding.~released bugThis bug was found in a stable release.

Type

No type

Projects

Status

✔️Awaiting QA

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions