Skip to content

Commit 64a6249

Browse files
authored
Retry on Datadog API errors (#138)
This adds retry support for submitting metrics. It’s rare, but at scale most users have probably encountered some failures to send metrics (although they might not have noticed it in their logs). Fixes #108. The basic approach here is to just retry failed requests rather than some more complicated (re-)queuing mechanism. @datadog/datadog-api-client has built-in retry support, but it doesn't handle network errors, so we’ve essentially re-created that retry logic here. You can configure retries through the `retries` and `retryBackoff` options: ``` metrics.init({ // How many times to retry a railed request. retries: 3, // Subsequent retries multiply this by powers of two, so this produces retries after: // 1 second, 2 seconds, 4 seconds retryBackoff: 1 }); ```
1 parent 2e78930 commit 64a6249

File tree

5 files changed

+275
-34
lines changed

5 files changed

+275
-34
lines changed

README.md

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,7 @@ Where `options` is an object and can contain the following:
121121
* See more details on setting your site at:
122122
https://docs.datadoghq.com/getting_started/site/#access-the-datadog-site
123123
* You can also set this via the `DATADOG_SITE` or `DD_SITE` environment variable.
124+
* Ignored if you set the `reporter` option.
124125
* `apiKey`: Sets the Datadog API key. (optional)
125126
* It's usually best to keep this in an environment variable.
126127
Datadog-metrics looks for the API key in the `DATADOG_API_KEY` or
@@ -129,6 +130,7 @@ Where `options` is an object and can contain the following:
129130
is required to send metrics.
130131
* Make sure not to confuse this with your _application_ key! For more
131132
details, see: https://docs.datadoghq.com/account_management/api-app-keys/
133+
* Ignored if you set the `reporter` option.
132134
* `appKey`: ⚠️ Deprecated. This does nothing and will be removed in an upcoming
133135
release.
134136

@@ -145,6 +147,14 @@ Where `options` is an object and can contain the following:
145147
same properties as the options object on the `histogram()` method. Options
146148
specified when calling the method are layered on top of this object.
147149
(optional)
150+
* `retries`: How many times to retry failed metric submissions to Datadog’s API.
151+
* Defaults to `2`.
152+
* Ignored if you set the `reporter` option.
153+
* `retryBackoff`: How long to wait before retrying a failed Datadog API call.
154+
Subsequent retries multiply this delay by 2^(retry count). For example, if
155+
this is set to `1`, retries will happen after 1, then 2, then 4 seconds.
156+
* Defaults to `1`.
157+
* Ignored if you set the `reporter` option.
148158
* `reporter`: An object that actually sends the buffered metrics. (optional)
149159
* There are two built-in reporters you can use:
150160
1. `reporters.DatadogReporter` sends metrics to Datadog’s API, and is
@@ -330,17 +340,23 @@ Contributions are always welcome! For more info on how to contribute or develop
330340

331341
**Breaking Changes:**
332342

333-
TBD
343+
* The `DatadogReporter` constructor now takes an options object instead of positional arguments. Using this constructor directly is pretty rare, so this likely doesn’t affect you!
334344

335345
**New Features:**
336346

337-
* Asynchronous actions now use promises instead of callbacks. In places where `onSuccess` and `onError` callbacks were used, they are now deprecated. Instead, those methods return promises (callbacks still work, but support will be removed in a future release). This affects:
347+
* Promises: asynchronous actions now use promises instead of callbacks. In places where `onSuccess` and `onError` callbacks were used, they are now deprecated. Instead, those methods return promises (callbacks still work, but support will be removed in a future release). This affects:
338348

339349
* The `flush()` method now returns a promise.
340350
* The `report(series)` method on any custom reporters should now return a promise. For now, datadog-metrics will use the old callback-based behavior if the method signature has callbacks listed after `series` argument.
341351

352+
* Retries: flushes to Datadog’s API are now retried automatically. This can help you work around intermittent network issues or rate limits. To adjust retries, use the `retries` and `retryBackoff` options.
353+
342354
* Environment variables can now be prefixed with *either* `DATADOG_` or `DD_` (previously, only `DATADOG_` worked) in order to match configuration with the Datadog agent. For example, you can set your API key via `DATADOG_API_KEY` or `DD_API_KEY`.
343355

356+
**Deprecations:**
357+
358+
* The `appKey` option is no longer supported. Application keys (as opposed to API keys) are not actually needed for sending metrics or distributions to the Datadog API. Including it in your configuration adds no benefits, but risks exposing a sensitive credential.
359+
344360
**Bug Fixes:**
345361

346362
* Support setting the `site` option via the `DATADOG_SITE` environment variable. The `apiHost` option was renamed to `site` in v0.11.0, but the `DATADOG_API_HOST` environment variable was accidentally left as-is. The old environment variable name is now deprecated, and will be removed at the same time as the `apiHost` option is removed.
@@ -349,8 +365,6 @@ Contributions are always welcome! For more info on how to contribute or develop
349365

350366
* Buffer metrics using `Map` instead of a plain object.
351367

352-
* Deprecated the `appKey` option. Application keys (as opposed to API keys) are not actually needed for sending metrics or distributions to the Datadog API. Including it in your configuration adds no benefits, but risks exposing a sensitive credential.
353-
354368
[View diff](https://github.com/dbader/node-datadog-metrics/compare/v0.11.4...main)
355369

356370
* 0.11.4 (2024-11-10)

lib/loggers.js

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,13 +39,15 @@ const Distribution = require('./metrics').Distribution;
3939

4040
/**
4141
* @typedef {object} BufferedMetricsLoggerOptions
42-
* @property {string} [apiKey] Datadog API key
42+
* @property {string} [apiKey] Datadog API key. Ignored if you set the
43+
* `reporter` option.
4344
* @property {string} [appKey] DEPRECATED: App keys aren't actually used for
4445
* metrics and are no longer supported.
4546
* @property {string} [host] Default host for all reported metrics
4647
* @property {string} [prefix] Default key prefix for all metrics
4748
* @property {string} [site] Sets the Datadog "site", or server where metrics
48-
* are sent. For details and options, see:
49+
* are sent. Ignored if you set the `reporter` option.
50+
* For details and options, see:
4951
* https://docs.datadoghq.com/getting_started/site/#access-the-datadog-site
5052
* @property {string} [apiHost] DEPRECATED: Please use `site` instead.
5153
* @property {number} [flushIntervalSeconds] How often to send metrics to
@@ -66,6 +68,11 @@ const Distribution = require('./metrics').Distribution;
6668
* metrics between flushes.
6769
* @property {ReporterType} [reporter] An object that actually sends the
6870
* buffered metrics.
71+
* @property {number} [retries] How many times to retry failed attempts to send
72+
* metrics to Datadog's API. Ignored if you set the `reporter` option.
73+
* @property {number} [retryBackoff] How many seconds to wait before retrying a
74+
* failed API request. Subsequent retries will multiply this delay.
75+
* Ignored if you set the `reporter` option.
6976
*/
7077

7178
/**
@@ -99,7 +106,12 @@ class BufferedMetricsLogger {
99106
/** @private */
100107
this.aggregator = opts.aggregator || new Aggregator(opts.defaultTags);
101108
/** @private @type {ReporterType} */
102-
this.reporter = opts.reporter || new DatadogReporter(opts.apiKey, opts.site);
109+
this.reporter = opts.reporter || new DatadogReporter({
110+
apiKey: opts.apiKey,
111+
site: opts.site,
112+
retries: opts.retries,
113+
retryBackoff: opts.retryBackoff
114+
});
103115
/** @private */
104116
this.host = opts.host;
105117
/** @private */

lib/reporters.js

Lines changed: 136 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,18 @@ const datadogApiClient = require('@datadog/datadog-api-client');
33
const { AuthorizationError } = require('./errors');
44
const { logDebug, logDeprecation } = require('./logging');
55

6+
const RETRYABLE_ERROR_CODES = new Set([
7+
'ECONNREFUSED',
8+
'ECONNRESET',
9+
'ENOTFOUND',
10+
'EPIPE',
11+
'ETIMEDOUT'
12+
]);
13+
14+
async function sleep(milliseconds) {
15+
await new Promise((r) => setTimeout(r, milliseconds));
16+
}
17+
618
/**
719
* A Reporter that throws away metrics instead of sending them to Datadog. This
820
* is useful for disabling metrics in your application and for tests.
@@ -13,6 +25,99 @@ class NullReporter {
1325
}
1426
}
1527

28+
/**
29+
* @private
30+
* A custom HTTP implementation for Datadog that retries failed requests.
31+
* Datadog has retries built in, but they don't handle network errors (just
32+
* HTTP errors), and we want to retry in both cases. This inherits from the
33+
* built-in HTTP library since we want to use the same fetch implementation
34+
* Datadog uses instead of adding another dependency.
35+
*/
36+
class RetryHttp extends datadogApiClient.client.IsomorphicFetchHttpLibrary {
37+
constructor(options = {}) {
38+
super(options);
39+
40+
// HACK: ensure enableRetry is always `false` so the base class logic
41+
// does not actually retry (since we manage retries here).
42+
Object.defineProperty(this, 'enableRetry', {
43+
get () { return false; },
44+
set () {},
45+
});
46+
}
47+
48+
async send(request) {
49+
let i = 0;
50+
while (true) { // eslint-disable-line no-constant-condition
51+
let response, error;
52+
try {
53+
response = await super.send(request);
54+
} catch (e) {
55+
error = e;
56+
}
57+
58+
if (this.isRetryable(response || error, i)) {
59+
await sleep(this.retryDelay(response || error, i));
60+
} else if (response) {
61+
return response;
62+
} else {
63+
throw error;
64+
}
65+
66+
i++;
67+
}
68+
}
69+
70+
/**
71+
* @private
72+
* @param {any} response HTTP response or error object
73+
* @returns {boolean}
74+
*/
75+
isRetryable(response, tryCount) {
76+
return tryCount < this.maxRetries && (
77+
RETRYABLE_ERROR_CODES.has(response.code)
78+
|| response.httpStatusCode === 429
79+
|| response.httpStatusCode >= 500
80+
);
81+
}
82+
83+
/**
84+
* @private
85+
* @param {any} response HTTP response or error object
86+
* @param {number} tryCount
87+
* @returns {number}
88+
*/
89+
retryDelay(response, tryCount) {
90+
if (response.httpStatusCode === 429) {
91+
// Datadog's official client supports just the 'x-ratelimit-reset'
92+
// header, so we support that here in addition to the standardized
93+
// 'retry-after' heaer.
94+
// There is also an upcoming IETF standard for 'ratelimit', but it
95+
// has moved away from the syntax used in 'x-ratelimit-reset'. This
96+
// stuff might change in the future.
97+
// https://datatracker.ietf.org/doc/draft-ietf-httpapi-ratelimit-headers/
98+
const delayHeader = response.headers['retry-after']
99+
|| response.headers['x-ratelimit-reset'];
100+
const delayValue = parseInt(delayHeader, 10);
101+
if (!isNaN(delayValue) && delayValue > 0) {
102+
return delayValue * 1000;
103+
}
104+
}
105+
106+
return this.backoffMultiplier ** tryCount * this.backoffBase * 1000;
107+
}
108+
}
109+
110+
/**
111+
* @typedef {Object} DatadogReporterOptions
112+
* @property {string} [apiKey] Datadog API key.
113+
* @property {string} [appKey] DEPRECATED! This option does nothing.
114+
* @property {string} [site] The Datadog "site" to send metrics to.
115+
* @property {number} [retries] Retry failed requests up to this many times.
116+
* @property {number} [retryBackoff] Delay before retries. Subsequent retries
117+
* wait this long multiplied by 2^(retry count).
118+
*/
119+
120+
/** @type {WeakMap<DatadogReporter, datadogApiClient.v1.MetricsApi>} */
16121
const datadogClients = new WeakMap();
17122

18123
/**
@@ -21,40 +126,48 @@ const datadogClients = new WeakMap();
21126
class DatadogReporter {
22127
/**
23128
* Create a reporter that sends metrics to Datadog's API.
24-
* @param {string} [apiKey]
25-
* @param {string} [appKey] DEPRECATED! This argument does nothing.
26-
* @param {string} [site]
129+
* @param {DatadogReporterOptions} [options]
27130
*/
28-
constructor(apiKey, appKey, site) {
29-
if (appKey) {
30-
if (!site && /(datadoghq|ddog-gov)\./.test(appKey)) {
31-
site = appKey;
32-
appKey = null;
33-
} else {
34-
logDeprecation(
35-
'The `appKey` option is no longer supported since it is ' +
36-
'not used for submitting metrics, distributions, events, ' +
37-
'or logs.'
38-
);
39-
}
131+
constructor(options = {}) {
132+
if (typeof options !== 'object') {
133+
throw new TypeError('DatadogReporter takes an options object, not multiple string arguments.');
40134
}
41135

42-
apiKey = apiKey || process.env.DATADOG_API_KEY || process.env.DD_API_KEY;
43-
this.site = site || process.env.DATADOG_SITE || process.env.DD_SITE || process.env.DATADOG_API_HOST;
136+
if (options.appKey) {
137+
logDeprecation(
138+
'The `appKey` option is no longer supported since it is ' +
139+
'not used for submitting metrics, distributions, events, ' +
140+
'or logs.'
141+
);
142+
}
143+
144+
const apiKey = options.apiKey || process.env.DATADOG_API_KEY || process.env.DD_API_KEY;
145+
this.site = options.site
146+
|| process.env.DATADOG_SITE
147+
|| process.env.DD_SITE
148+
|| process.env.DATADOG_API_HOST;
44149

45150
if (!apiKey) {
46151
throw new Error(
47-
'Datadog API key not found. You must specify one via a ' +
48-
'configuration option or the DATADOG_API_KEY (or DD_API_KEY) ' +
49-
'environment variable.'
152+
'Datadog API key not found. You must specify one via the ' +
153+
'`apiKey` configuration option or the DATADOG_API_KEY or ' +
154+
'DD_API_KEY environment variable.'
50155
);
51156
}
52157

53158
const configuration = datadogApiClient.client.createConfiguration({
54159
authMethods: {
55160
apiKeyAuth: apiKey,
56-
}
161+
},
162+
httpApi: new RetryHttp(),
163+
maxRetries: options.retries >= 0 ? options.retries : 2,
57164
});
165+
166+
// HACK: Specify backoff here rather than in configration options to
167+
// support values less than 2 (mainly for faster tests).
168+
const backoff = options.retryBackoff >= 0 ? options.retryBackoff : 1;
169+
configuration.httpApi.backoffBase = backoff;
170+
58171
if (this.site) {
59172
// Strip leading `app.` from the site in case someone copy/pasted the
60173
// URL from their web browser. More details on correct configuration:
@@ -64,6 +177,7 @@ class DatadogReporter {
64177
site: this.site
65178
});
66179
}
180+
67181
datadogClients.set(this, new datadogApiClient.v1.MetricsApi(configuration));
68182
}
69183

@@ -139,7 +253,7 @@ class DataDogReporter extends DatadogReporter {
139253
'DataDogReporter has been renamed to DatadogReporter (lower-case ' +
140254
'D in "dog"); the old name will be removed in a future release.'
141255
);
142-
super(apiKey, appKey, site);
256+
super({ apiKey, appKey, site });
143257
}
144258
}
145259

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
"typescript": "^4.8.4"
4040
},
4141
"dependencies": {
42-
"@datadog/datadog-api-client": "^1.16.0",
42+
"@datadog/datadog-api-client": "^1.17.0",
4343
"debug": "^4.1.0"
4444
},
4545
"engines": {

0 commit comments

Comments
 (0)