-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Agent Environment
Agent 7.55.1 - Commit: 8ec9dff - Serialization version: v5.0.119 - Go version: go1.21.11
Describe what happened:
In production, we observed a number of our agent instances failing to resolve agent-intake.logs.us5.datadoghq.com, which does not exist, since US5 does not support TCP log submission. This caused our logs to not be ingested.
2024-11-12 14:32:20 UTC | CORE | WARN | (pkg/logs/client/tcp/connection_manager.go:108 in NewConnection) | dial tcp: lookup agent-intake.logs.us5.datadoghq.com: no such host
After some investigation, it seems that our agents had failed the HTTP health check at startup, and fell back to TCP
2024-11-12 14:32:20 UTC | CORE | WARN | (pkg/logs/client/http/destination.go:442 in CheckConnectivity) | HTTP connectivity failure: Post "https://agent-http-intake.logs.us5.datadoghq.com/api/v2/logs": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024-11-12 14:32:20 UTC | CORE | WARN | (comp/logs/agent/config/config.go:120 in BuildEndpointsWithConfig) | You are currently sending Logs to Datadog through TCP (either because logs_config.force_use_tcp or logs_config.socks5_proxy_address is set or the HTTP connectivity test has failed) To benefit from increased reliability and better network performances, we strongly encourage switching over to compressed HTTPS which is now the default protocol.
Describe what you expected:
I would not expect the agent to fall back to a TCP endpoint that does not exist. If US5 does not support TCP, then the agent should act as if force_http is enabled, or fail loudly in some other way.
Steps to reproduce the issue:
I don't have explicit steps to do this, but if you can make the HTTP probe fail in some way (perhaps an iptables rule to drop it so it times out), you can get into this state.
Additional environment details (Operating System, Cloud provider, etc): The container image used is gcr.io/datadoghq/agent:7.55.1