-
Notifications
You must be signed in to change notification settings - Fork 16
[SVLS-7934] feat: Support TLS certificate for trace/stats flusher #961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bottlecap/src/config/yaml.rs
Outdated
| #[serde(deserialize_with = "deserialize_optional_string")] | ||
| pub http_protocol: Option<String>, | ||
| #[serde(deserialize_with = "deserialize_optional_string")] | ||
| pub ssl_ca_cert: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the DD_SSL_CA_CERT environment variable comes from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .merge(Env::prefixed("DD_")); |
It's handled by this code, which puts the value of env var
DD_SSL_CA_CERT to the field ssl_ca_cert. Is this what you are asking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'm asking where does DD_SSL_CA_CERT name convention comes from, does this come from the Datadog Agent config? Does it come from the docs? Did you came up with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I came up with it. Upon further search, I found tls_ca_cert here:
https://github.com/DataDog/integrations-core/blob/master/http_check/datadog_checks/http_check/data/conf.yaml.example#L477
so I'll rename the env var to DD_SSL_CA_CERT. Does this sound good to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, although I'd confirm with the Agent team!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Env var name LGTM from the perspective of agent-configuration team!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. Changing to DD_TLS_CERT_FILE to be consistent with:
https://github.com/DataDog/datadog-agent/blob/0638dfce1e1f3a9ae336334d4df01cb2a5e35120/pkg/config/setup/config.go#L1410
The same config option has different names in different places. This PR just picks one of them.
duncanista
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
One las thing, do logs/metrics/proxy clients need this change too?
@duncanista Maybe, but somehow I didn't see any error with logs/metrics client in my test. If this or future customer reports such error, we can reproduce it then fix it in a separate PR. |
## Problem A customer reported that their Lambda is behind a proxy, and the Rust-based extension encounters an error when sending logs and metrics to Datadog via the proxy. A previous PR #961 fixed this for traces and stats, but not for other things because the customer and I didn't see any error with them at that time. ## This PR Applies the env var `DD_TLS_CERT_FILE` to logs flusher and proxy flusher as well. Example: `DD_TLS_CERT_FILE=/opt/ca-cert.pem`, so the when the extension flushes logs or proxied data to Datadog, the HTTP client created can load and use this cert, and connect the proxy properly. ## Testing 1. Create a Lambda in a VPC with a proxy EC2 instance. 2. Connect to the proxy instance. With the help of ChatGPT, set up a custom-build nginx with `ngx_http_proxy_connect_module` 3. Save the CA certificate from the proxy server to local machine 4. In the CDK stack, add a layer to the Lambda, which includes the CA certificate `ca-cert.pem` 5. Set env vars: - `DD_TLS_CERT_FILE=/opt/ca-cert.pem` - `DD_PROXY_HTTPS=http://10.0.0.30:3128`, where `10.0.0.30` is the private IP of the proxy EC2 instance - `DD_LOG_LEVEL=debug` 6. Invoke the Lambda ## Result **Before:** Log flushing failed: > DD_EXTENSION | ERROR | LOGS | Failed to send request after 97 ms and 3 attempts: reqwest::Error { kind: Request, url: "https://http-intake.logs.datadoghq.com/api/v2/logs", source: hyper_util::client::legacy::Error(Connect, ConnectFailed(Custom { kind: Other, error: Custom { kind: InvalidData, error: InvalidCertificate(UnknownIssuer) } })) } **After:** No such error ## Next steps Do the same thing for dogstatsd metric flusher. Metric flusher is in a separate repo https://github.com/DataDog/serverless-components, so let's create separate PRs for that change. ## Notes Customer report issue: #919
…er (#984) ## Motivation A customer reported that their Lambda is behind a proxy, and the Rust-based extension encounters an error when sending logs and metrics to Datadog via the proxy. ## Overview Previous PRs in fixed this for traces/stats (#961) and logs/proxy (#979). DataDog/serverless-components#61 in `serverless-components` repo fixed this issue for dogstatsd metrics. This PR upgrades the dependency on `serverless-components` to patch this update. ## Testing #### Steps See #979 #### Result **Before:** Metrics flushing failed: > DD_EXTENSION | DEBUG | Flushing 0 series and 1 distributions DD_EXTENSION | DEBUG | Sending distributions: SketchPayload { ... } DD_EXTENSION | DEBUG | Request to https://api.datadoghq.com/api/beta/sketches took 144ms DD_EXTENSION | ERROR | Error shipping data: None Failed to send request after 3 attempts DD_EXTENSION | ERROR | Failed to flush some metrics due to shipping errors: 0 series and 1 sketches **After:** Metrics flushing succeeded: > DD_EXTENSION | DEBUG | Flushing 0 series and 1 distributions DD_EXTENSION | DEBUG | Sending distributions: SketchPayload { ... } DD_EXTENSION | DEBUG | Request to https://api.datadoghq.com/api/beta/sketches took 619ms DD_EXTENSION | DEBUG | Successfully flushed 0 series and 1 distributions ## Notes Customer report issue: #919
Problem
A customer reported that their Lambda is behind a proxy, and the Rust-based extension can't send traces to Datadog via the proxy, while the previous go-based extension worked.
This PR
Supports the env var
DD_TLS_CERT_FILE: The path to a file of concatenated CA certificates in PEM format.Example:
DD_TLS_CERT_FILE=/opt/ca-cert.pem, so the when the extension flushes traces/stats to Datadog, the HTTP client created can load and use this cert, and connect the proxy properly.Testing
Steps
ca-cert.pemDD_TLS_CERT_FILE=/opt/ca-cert.pemDD_PROXY_HTTPS=http://10.0.0.30:3128, where10.0.0.30is the private IP of the proxy EC2 instanceDD_LOG_LEVEL=debughttp://10.0.0.30:3128Result
Before
Trace flush failed with error logs:
After
Trace flush is successful:
Notes
This fix only covers trace flusher and stats flusher, which use
ServerlessTraceFlusher::get_http_client()to create the HTTP client. It doesn't cover logs flusher and proxy flusher, which use a different function (http.rs:get_client()) to create the HTTP client. However, logs flushing was successful in my tests, even if no certificate was added. We can come back to logs/proxy flusher if someone reports an error.