-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Hi team!
We are hitting some limits using the TelemetryClient to send metric batches to the NR API. We increase the quota for the account, but we think we should define some protections from the client side.
Context
Our service is pushing a lot of information using batches into the NR API, and when we start seeing 429 errors in our calls, our memory footprint grows quickly in the service, producing OOM and feeding the snowball of 429 errors.
Checking the TelemetryClient we see:
- We call to the
sendBatch(MetricBatch batch)public method inTelemetryClientclass https://github.com/newrelic/newrelic-telemetry-sdk-java/blob/v0.17.0/telemetry-core/src/main/java/com/newrelic/telemetry/TelemetryClient.java#L142 - It call to
scheduleBatchSend([...])method https://github.com/newrelic/newrelic-telemetry-sdk-java/blob/v0.17.0/telemetry-core/src/main/java/com/newrelic/telemetry/TelemetryClient.java#L181-L187 - It call to
scheduleBatchSendpassing as argument a static default backoff that cannot be injected or overrided :(
The static backoff defined has this configuration
https://github.com/newrelic/newrelic-telemetry-sdk-java/blob/v0.17.0/telemetry-core/src/main/java/com/newrelic/telemetry/Backoff.java#L18-L24
Feature request
Follow the KISS principle, we like to request the ability to override the default backoff. If we reach the OOM we definitively lose data, maybe is better reduce the number of retries.
Nice to have
Expose metric of discarded batches
Checking the library, we see a notify handler in case of batches that will not send. Will be nice expose this information as a metric, it could help to define alerts based on that
Provide a circuit breaker support
In this case we reach 429 errors, but the different sends are not aware of the others trying to send, and retry again and again until be exhaust, creating a bigger problem of 429 errors.
Adding a circuit breaker support could alleviate this problem, reducing/stopping the batches send when the 429 error appears, and open the send when the API is answering.