Skip to content

Commit f5ded33

Browse files
authored
Improve headless browser context isolation (#1169)
2 parents 9402653 + 3a9b513 commit f5ded33

File tree

2 files changed

+13
-0
lines changed

2 files changed

+13
-0
lines changed

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,14 @@
22

33
All changes that impact users of this module are documented in this file, in the [Common Changelog](https://common-changelog.org) format with some additional specifications defined in the CONTRIBUTING file. This codebase adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
44

5+
## Unreleased [minor]
6+
7+
> Development of this release was supported by the [French Ministry for Foreign Affairs](https://www.diplomatie.gouv.fr/fr/politique-etrangere-de-la-france/diplomatie-numerique/) through its ministerial [State Startups incubator](https://beta.gouv.fr/startups/open-terms-archive.html) under the aegis of the Ambassador for Digital Affairs.
8+
9+
### Changed
10+
11+
- Improve headless browser context isolation when fetching pages by disabling cache and clearing cookies between requests to prevent session persistence across different URLs and to improve tracking success rate
12+
513
## 5.6.1 - 2025-06-30
614

715
_Full changeset and discussions: [#1168](https://github.com/OpenTermsArchive/engine/pull/1168)._

src/archivist/fetcher/fullDomFetcher.js

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,11 @@ export default async function fetch(url, cssSelectors, config) {
2020
await page.setDefaultNavigationTimeout(config.navigationTimeout);
2121
await page.setExtraHTTPHeaders({ 'Accept-Language': config.language });
2222

23+
await page.setCacheEnabled(false); // Disable cache to ensure fresh content on each fetch and prevent stale data from previous requests
24+
const client = await page.target().createCDPSession();
25+
26+
await client.send('Network.clearBrowserCookies'); // Clear cookies to ensure clean state between fetches and prevent session persistence across different URLs
27+
2328
response = await page.goto(url, { waitUntil: 'load' }); // Using `load` instead of `networkidle0` as it's more reliable and faster. The 'load' event fires when the page and all its resources (stylesheets, scripts, images) have finished loading. `networkidle0` can be problematic as it waits for 500ms of network inactivity, which may never occur on dynamic pages and then triggers a navigation timeout.
2429

2530
if (!response) {

0 commit comments

Comments
 (0)