Skip to content

Commit 829ce0f

Browse files
committed
Improve filters doc
1 parent 1664c0d commit 829ce0f

File tree

3 files changed

+121
-1
lines changed

3 files changed

+121
-1
lines changed
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
title: "Filters"
3+
weight: 3
4+
---
5+
6+
# Filters
7+
8+
Filters solve noise issues in terms versions that cannot be addressed with direct selection or removal of content using CSS selectors or range selectors.
9+
10+
## Why filters are needed
11+
12+
Web pages often contain dynamically generated content or content that cannot be targeted with CSS selectors that creates noise in the archive:
13+
14+
- Tracking parameters in URLs, for example `utm_source`, `utm_medium`, …
15+
- Content that are date based and can change between visits, for example "Updated X days ago" can be converted to a "Last updated on YYYY-MM-DD".
16+
- Dynamic elements with changing classes or IDs
17+
18+
Without filters, this dynamic content creates changes that are not meaningful to the terms.
19+
20+
## How filters work
21+
22+
Filters are JavaScript functions that receive a JSDOM document instance and can manipulate the DOM structure directly. They modify the document structure and content in-place and they run sequentially in the order specified in the declaration.
23+
24+
## Filter design principles
25+
26+
When designing filters, follow these core principles:
27+
28+
- **Be specific**: Target only the noise you want to remove. Avoid broad selectors that might accidentally remove important content.
29+
- **Be safe**: Ensure your filter doesn't accidentally remove important content. Always check that the generated version still contains the whole terms content.
30+
- **Be idempotent**: Your filter should produce the same result even if run multiple times on its own output. This ensures consistency and prevents unexpected behavior.
31+
- **Be efficient**: Use efficient DOM queries and avoid unnecessary operations. Process only the elements you need to modify.
32+
33+
## When to use filters
34+
35+
Use filters when:
36+
37+
- **CSS selectors are insufficient**: When noise appears within content that can't be targeted with selectors or [range selectors]({{< relref "terms/explanation/range-selectors" >}}) with the [`select`]({{< relref "terms/reference/declaration/#ref-select" >}}) and [`remove`]({{< relref "terms/reference/declaration/#ref-remove" >}}) properties.
38+
- **Meaningful content is dynamic**: When elements change on each page load, for example "Updated X days ago" can be converted to a "Last updated on YYYY-MM-DD".
39+
- **Patterns are complex**: When simple removal isn't possible, for example removing all the tracking parameters in URLs.

content/terms/reference/declaration.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ As an array of those:
140140
{{< refItem
141141
name="filter"
142142
type="array of strings or objects"
143-
description="Array of filter functions to apply. Each item can be either a string (function name) or an object (function name as key, parameters as value). Functions will be executed in the order of the array. See the [Filters]({{< relref \"terms/reference/~~filters\" >}}) section for more information."
143+
description="Array of filter functions to apply. Each item can be either a string (function name) or an object (function name as key, parameters as value). Functions will be executed in the order of the array. See the [Filters]({{< relref \"terms/reference/filters\" >}}) section for more information."
144144
>}}
145145
```json
146146
"filter": [

content/terms/reference/filters.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
---
2+
title: "Filters"
3+
---
4+
5+
# Filters
6+
7+
Filters are JavaScript functions that take the document DOM as parameter and are:
8+
9+
- **in-place**: they modify the document structure and content directly;
10+
- **idempotent**: they return the same document structure and content even if run repeatedly on their own result.
11+
12+
The generic function signature for a filter is:
13+
14+
```js
15+
export [async] function filterName(document, [parameters])
16+
```
17+
18+
Each filter is exposed as a named function export that takes a `document` parameter and behaves like the `document` object in a browser DOM.
19+
> The `document` parameter is actually a [JSDOM](https://github.com/jsdom/jsdom) document instance.
20+
21+
Filters can have parameters that are passed as second parameter.
22+
23+
These functions can be `async`, but they will still run sequentially.
24+
25+
## Usage
26+
27+
### Simple filter
28+
29+
```js
30+
// <service name>.filters.js
31+
export function customFilter(document) {
32+
// filter logic here
33+
}
34+
```
35+
36+
Can be used as follows in the declaration:
37+
38+
```json
39+
// <service name>.json
40+
{
41+
"name": "<service name>",
42+
"terms": {
43+
"<terms type>": {
44+
"fetch": "<fetch URL>",
45+
"select": "<select CSS selector>",
46+
"filter": [
47+
"customFilter"
48+
]
49+
}
50+
}
51+
}
52+
```
53+
54+
### Filter with parameters
55+
56+
```js
57+
// <service name>.filters.js
58+
export function customParameterizedFilter(document, parameters) {
59+
// filter logic here
60+
}
61+
```
62+
63+
Can be used as follows in the declaration:
64+
65+
```json
66+
// <service name>.json
67+
{
68+
"name": "<service name>",
69+
"terms": {
70+
"<terms type>": {
71+
"fetch": "<fetch URL>",
72+
"select": "<select CSS selector>",
73+
"filter": [
74+
{
75+
"customParameterizedFilter": "params"
76+
}
77+
]
78+
}
79+
}
80+
}
81+
```

0 commit comments

Comments
 (0)