Skip to content

Commit d486b21

Browse files
committed
Add filters explanation
1 parent 771951b commit d486b21

File tree

2 files changed

+58
-0
lines changed

2 files changed

+58
-0
lines changed
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
title: Explanation
3+
weight: 2
4+
---
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
---
2+
title: "Filters"
3+
---
4+
5+
# Filters
6+
7+
Some documents require more complex filtering beyond basic element selection and removal. For example, web pages often contain dynamically generated content like tracking IDs in URLs that change on each page load. While these elements are part of the page, they are not meaningful to the terms content itself. If such dynamic content is included in the archived versions, it creates a lot of insignificant versions and pollutes the archive with noise that makes it harder to identify actual changes to the terms.
8+
9+
Filters address this need by providing a way to programmatically clean up and normalize the content before archiving. They are implemented as JavaScript functions that can manipulate the downloaded web page using the [DOM API](https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model), allowing for sophisticated content transformations beyond what's possible with simple CSS selectors.
10+
11+
Filters take the document DOM and the terms declaration as parameters and are:
12+
13+
- **in-place**: they modify the document structure and content directly;
14+
- **idempotent**: they should return the same document structure and content even if run repeatedly on their own result.
15+
16+
Filters are loaded automatically from files named after the service they operate on. For example, filters for the Meetup service, which is declared in `declarations/Meetup.json`, are loaded from `declarations/Meetup.filters.js`.
17+
18+
The generic function signature for a filter is:
19+
20+
```js
21+
export [async] function filterName(document, documentDeclaration)
22+
```
23+
24+
Each filter is exposed as a named function export that takes a `document` parameter and behaves like the `document` object in a browser DOM. These functions can be `async`, but they will still run sequentially. The whole document declaration is passed as second parameter.
25+
26+
> The `document` parameter is actually a [JSDOM](https://github.com/jsdom/jsdom) document instance.
27+
28+
You can learn more about usual noise and ways to handle it [in the guidelines]({{< relref "/terms/guidelines/declaring#usual-noise" >}}).
29+
30+
### Example
31+
32+
Let's assume a service adds a unique `clickId` parameter in the query string of all link destinations. These parameters change on each page load, leading to recording noise in versions. Since links should still be recorded, it is not appropriate to use `remove` to remove the links entirely. Instead, a filter will manipulate the links destinations to remove the always-changing parameter. Concretely, the goal is to apply the following filter:
33+
34+
```diff
35+
- Read the <a href="https://example.com/example-page?clickId=349A2033B&lang=en">list of our affiliates</a>.
36+
+ Read the <a href="https://example.com/example-page?lang=en">list of our affiliates</a>.
37+
```
38+
39+
The code below implements this filter:
40+
41+
```js
42+
function removeTrackingIdsQueryParam(document) {
43+
const QUERY_PARAM_TO_REMOVE = 'clickId';
44+
45+
document.querySelectorAll('a').forEach(link => { // iterate over every link in the page
46+
const url = new URL(link.getAttribute('href'), document.location); // URL is part of the DOM API, see https://developer.mozilla.org/en-US/docs/Web/API/URL
47+
const params = new URLSearchParams(url.search); // URLSearchParams is part of the DOM API, see https://developer.mozilla.org/en-US/docs/Web/API/URLSearchParams
48+
49+
params.delete(QUERY_PARAM_TO_REMOVE); // we use the DOM API instead of RegExp because we can't know in advance in which order parameters will be written
50+
url.search = params.toString(); // store the query string without the parameter
51+
link.setAttribute('href', url.toString()); // write the destination URL without the parameter
52+
});
53+
}
54+
```

0 commit comments

Comments
 (0)