Skip to content

Feature request: configurable Backoff for the API calls #333

@dariocazas

Description

@dariocazas

Hi team!

We are hitting some limits using the TelemetryClient to send metric batches to the NR API. We increase the quota for the account, but we think we should define some protections from the client side.

Context

Our service is pushing a lot of information using batches into the NR API, and when we start seeing 429 errors in our calls, our memory footprint grows quickly in the service, producing OOM and feeding the snowball of 429 errors.

Checking the TelemetryClient we see:

  1. We call to the sendBatch(MetricBatch batch) public method in TelemetryClient class https://github.com/newrelic/newrelic-telemetry-sdk-java/blob/v0.17.0/telemetry-core/src/main/java/com/newrelic/telemetry/TelemetryClient.java#L142
  2. It call to scheduleBatchSend([...])method https://github.com/newrelic/newrelic-telemetry-sdk-java/blob/v0.17.0/telemetry-core/src/main/java/com/newrelic/telemetry/TelemetryClient.java#L181-L187
  3. It call to scheduleBatchSend passing as argument a static default backoff that cannot be injected or overrided :(

The static backoff defined has this configuration
https://github.com/newrelic/newrelic-telemetry-sdk-java/blob/v0.17.0/telemetry-core/src/main/java/com/newrelic/telemetry/Backoff.java#L18-L24

Feature request

Follow the KISS principle, we like to request the ability to override the default backoff. If we reach the OOM we definitively lose data, maybe is better reduce the number of retries.

Nice to have

Expose metric of discarded batches

Checking the library, we see a notify handler in case of batches that will not send. Will be nice expose this information as a metric, it could help to define alerts based on that

Provide a circuit breaker support

In this case we reach 429 errors, but the different sends are not aware of the others trying to send, and retry again and again until be exhaust, creating a bigger problem of 429 errors.

Adding a circuit breaker support could alleviate this problem, reducing/stopping the batches send when the 429 error appears, and open the send when the API is answering.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions