Skip to content

Commit 229b764

Browse files
committed
Treat bot blocking errors as potentially transient
1 parent 82973c1 commit 229b764

File tree

3 files changed

+18
-12
lines changed

3 files changed

+18
-12
lines changed

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,14 @@
22

33
All changes that impact users of this module are documented in this file, in the [Common Changelog](https://common-changelog.org) format with some additional specifications defined in the CONTRIBUTING file. This codebase adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
44

5+
## Unreleased [minor]
6+
7+
> Development of this release was supported by the [Lab Platform Governance, Media and Technology](https://platform-governance.org) (PGMT), Centre for Media, Communication and Information Research (ZeMKI), University of Bremen as part of the project [Governance: Private ordering of ComAI through corporate communication and policies](https://comai.space/en/projects/p4-governance-private-ordering-of-comai-through-corporate-communication-and-policies/) in the research unit [Communicative AI](https://comai.space/en/), funded by the German Research Foundation (DFG) ([Grant No. 516511468)](https://gepris.dfg.de/gepris/projekt/544643936?language=en).
8+
9+
### Added
10+
11+
- Extend automatic retry mechanism for failed tracking attempts due to likely bot blocking errors to improve tracking success rate
12+
513
## 5.5.0 - 2025-06-04
614

715
_Full changeset and discussions: [#1159](https://github.com/OpenTermsArchive/engine/pull/1159)._

src/archivist/fetcher/errors.js

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,25 @@
11
export class FetchDocumentError extends Error {
2+
static LIKELY_BOT_BLOCKING_ERRORS = [
3+
'HTTP code 403',
4+
'HTTP code 406',
5+
'HTTP code 502',
6+
'ECONNRESET',
7+
];
8+
29
static LIKELY_TRANSIENT_ERRORS = [
310
'EAI_AGAIN', // DNS lookup temporary failure - DNS server is temporarily unavailable or overloaded
411
'ETIMEDOUT', // Connection timeout - network latency or server load issues
5-
'ECONNRESET', // Connection reset - connection was forcibly closed, often due to network issues
612
'ERR_NAME_NOT_RESOLVED', // DNS lookup temporary failure - DNS server is temporarily unavailable or overloaded
713
'HTTP code 500', // Internal Server Error - server encountered an error while processing the request
8-
'HTTP code 502', // Bad Gateway - upstream server returned invalid response, often temporary
914
'HTTP code 503', // Service Unavailable - server is temporarily overloaded or down for maintenance
1015
'HTTP code 504', // Gateway Timeout - upstream server took too long to respond, might be temporary
16+
...FetchDocumentError.LIKELY_BOT_BLOCKING_ERRORS,
1117
];
1218

1319
constructor(message) {
1420
super(`Fetch failed: ${message}`);
1521
this.name = 'FetchDocumentError';
1622
this.mayBeTransient = FetchDocumentError.LIKELY_TRANSIENT_ERRORS.some(err => message.includes(err));
23+
this.mayBeBotBlocking = FetchDocumentError.LIKELY_BOT_BLOCKING_ERRORS.some(err => message.includes(err));
1724
}
1825
}

src/archivist/fetcher/index.js

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,6 @@ export const FETCHER_TYPES = {
1212
HTML_ONLY: 'htmlOnly',
1313
};
1414

15-
const LIKELY_BOT_BLOCKING_ERRORS = [
16-
'HTTP code 403',
17-
'HTTP code 406',
18-
'HTTP code 502',
19-
'ECONNRESET',
20-
];
21-
2215
/**
2316
* Fetch a resource from the network, returning a promise which is fulfilled once the response is available
2417
* @function fetch
@@ -70,9 +63,7 @@ async function fetchWithFallback(url, cssSelectors, fetcherConfig) {
7063
try {
7164
return await fetchWithHtmlOnly(url, fetcherConfig);
7265
} catch (error) {
73-
const isBotBlockingError = LIKELY_BOT_BLOCKING_ERRORS.some(code => error.message.includes(code));
74-
75-
if (!isBotBlockingError || fetcherConfig.executeClientScripts === false) {
66+
if (!error.mayBeBotBlocking || fetcherConfig.executeClientScripts === false) {
7667
throw error;
7768
}
7869

0 commit comments

Comments
 (0)