-
Notifications
You must be signed in to change notification settings - Fork 781
Open
Labels
#g-security-complianceSecurity & Compliance product groupSecurity & Compliance product groupbugSomething isn't working as documentedSomething isn't working as documented~dogfoodIssue resulted from Fleet's product dogfooding.Issue resulted from Fleet's product dogfooding.~released bugThis bug was found in a stable release.This bug was found in a stable release.
Milestone
Description
Fleet version:
4.79
💥 Actual behavior
In our Dogfood production environment, client errors are being reported to OTEL/APM/Sentry as server errors. This includes:
- TCP read timeouts - client disconnected or didn't send data in time
- Authentication failures (
*http.AuthRequiredError,*http.AuthHeaderRequiredError) - missing/invalid credentials *service.OsqueryErrorwithcontext canceled- client disconnected mid-request
These are client-side issues (4xx errors) but are polluting error dashboards and triggering false investigations as if they were server errors (5xx).
🛠️ To fix
Follow OTEL semantic conventions for HTTP spans:
- Per OTEL spec: "For HTTP status codes in the 4xx range, span status MUST be left unset in case of SpanKind.SERVER"
- Only set span status to
Errorfor server errors (5xx) - Skip sending client errors to APM/Sentry
- Add separate OTEL metrics for client errors (
fleet.http.client_errors) vs server errors (fleet.http.server_errors)
🧑💻 Steps to reproduce
These steps:
- Have been confirmed to consistently lead to reproduction in multiple Fleet instances.
- Enable OTEL tracing in Fleet (
FLEET_LOGGING_TRACING_ENABLED=trueandFLEET_LOGGING_TRACING_TYPE=opentelemetry) - Send a request with invalid/missing authentication
- Observe that the error is reported to OTEL with span status set to
Error - Alternatively, open a connection to Fleet and disconnect before completing the request (TCP timeout)
- Observe the timeout error is reported as a server error in OTEL
🕯️ More info (optional)
QA
Since OTEL is not productized, it does not need to be QA'ed. We will monitor our Dogfood OTEL to make sure these issues are resolved.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
#g-security-complianceSecurity & Compliance product groupSecurity & Compliance product groupbugSomething isn't working as documentedSomething isn't working as documented~dogfoodIssue resulted from Fleet's product dogfooding.Issue resulted from Fleet's product dogfooding.~released bugThis bug was found in a stable release.This bug was found in a stable release.
Type
Projects
Status
✔️Awaiting QA