Skip to content

Commit 15ee83c

Browse files
authored
Update filters documentation (#198)
2 parents 84fd469 + f5afdf7 commit 15ee83c

File tree

12 files changed

+502
-47
lines changed

12 files changed

+502
-47
lines changed
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
title: "Filters"
3+
weight: 3
4+
---
5+
6+
# Filters
7+
8+
Filters enable solving [noise]({{< relref "/terms/guideline/declaring/#usual-noise" >}}) issues in versions that cannot be addressed with direct selection or removal of content using selectors.
9+
10+
## When filters are needed
11+
12+
Use filters when:
13+
14+
- **Content selectors are insufficient**, for example when noise appears within content that can't be targeted with CSS selectors or [range selectors]({{< relref "terms/explanation/range-selectors" >}}) with the [`select`]({{< relref "terms/reference/declaration/#ref-select" >}}) and [`remove`]({{< relref "terms/reference/declaration/#ref-remove" >}}) properties.
15+
- **Content is dynamically generated**, for example when elements change on each page load with changing classes or IDs that cannot be targeted with [attribute selectors](https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors).
16+
- **Complex tasks are needed**, for example when content transformation is required such as converting images to base64 to store them in the terms version or converting date-based content to a stable format (like “Updated X days ago” to “Last updated on YYYY-MM-DD”).
17+
18+
## How filters work
19+
20+
Filters are JavaScript functions that can manipulate the DOM structure directly. They modify the document structure and content in-place.
21+
22+
## Filter design principles
23+
24+
Filters should follow these core principles:
25+
26+
- **Specific**: target only the noise to remove. Avoid broad selectors that might accidentally remove important content.
27+
28+
> For example, if a filter converts relative dates to absolute dates, make sure to scope the targeted dates. This might translate to selecting with `.metadata time`, not `time`, which might also affect important effective dates within the terms content.
29+
30+
- **Idempotent**: filters should produce the same result even if run multiple times on their own output. This ensures consistency.
31+
32+
> For example, if a filter adds section numbers like "1." to headings, it should check if the numbers already exist, to prevent "1. Privacy Policy" from becoming "1. 1. Privacy Policy" on repeated runs.
33+
34+
- **Efficient**: DOM queries should be optimised and filters should avoid unnecessary operations, processing only the elements needed.
35+
36+
> For example, if a filter updates timestamp elements with a specific class, using `document.querySelectorAll('.timestamp')` is more efficient than `document.querySelectorAll('*')` followed by filtering for timestamp elements.
37+
38+
- **Safe**: filters must not accidentally remove important content. The generated version should always be checked after adding a filter to ensure it still contains the whole terms content.
Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
---
2+
title: Apply filters
3+
weight: 7
4+
---
5+
6+
# How to apply filters
7+
8+
This guide explains how to add filters to existing declarations to remove meaningless content that cannot be removed with CSS selectors, to prevent noise in the versions.
9+
10+
## Prerequisites
11+
12+
- An existing terms declaration file.
13+
- Having already identified the noise to remove and having double-checked it cannot be removed with CSS selectors with the [`remove`]({{< relref "terms/reference/declaration/#ref-remove" >}}) property.
14+
15+
## Step 1: Check for built-in filters
16+
17+
Built-in filters are pre-defined functions that handle common noise patterns. They are the easiest way to clean up content.
18+
19+
Review the available [built-in filters]({{< relref "/terms/reference/built-in-filters" >}}) to find if one matches your needs.
20+
21+
If you find a suitable built-in filter, proceed to [Step 3](#step-3-declare-the-filter), otherwise you will need to create a custom filter.
22+
23+
## Step 2: Create a custom filter _(optional)_
24+
25+
If no built-in filter matches your needs, you will need to create a custom filter. This requires JavaScript knowledge and familiarity with DOM manipulation.
26+
27+
### Create the filter file
28+
29+
Create a JavaScript file in the same folder and with the same name as your service declaration, but with `.filters.js` extension.
30+
31+
> For example, if your declaration is `declarations/MyService.json`, create `declarations/MyService.filters.js`.
32+
33+
### Write the filter function
34+
35+
Define your filter function with the following signature:
36+
37+
```js
38+
export function myCustomFilter(document, [parameters]) {
39+
// Your filter logic here
40+
}
41+
```
42+
43+
#### Parameters
44+
45+
- `document`: JSDOM document instance representing the web page
46+
- `parameters`: values passed from the declaration _(optional)_
47+
48+
#### Example: Remove session IDs from text content
49+
50+
For example, let's say you want to remove session IDs from text content:
51+
52+
```html
53+
<p>We collect your data for the following purposes:</p>
54+
<ul>
55+
<li>To provide our services</li>
56+
<li>To improve user experience</li>
57+
</ul>
58+
<p class="session-id">Last updated on 2023-12-07 (Session: abc123def456)</p>
59+
```
60+
61+
You can implement this filter as follows:
62+
63+
```js
64+
export function removeSessionIds(document) {
65+
// Find all paragraphs that might contain session IDs
66+
const paragraphs = document.querySelectorAll('.session-id');
67+
68+
paragraphs.forEach(paragraph => {
69+
let text = paragraph.textContent;
70+
// Remove session ID patterns like "Session: abc123" or "(Session: def456)"
71+
text = text.replace(/\s*\(?Session:\s*[a-zA-Z0-9]+\)?/g, '');
72+
paragraph.textContent = text.trim();
73+
});
74+
}
75+
```
76+
77+
Result after applying the filter:
78+
79+
```diff
80+
<p>We collect your data for the following purposes:</p>
81+
<ul>
82+
<li>To provide our services</li>
83+
<li>To improve user experience</li>
84+
</ul>
85+
- <p class="session-id">Last updated on 2023-12-07 (Session: abc123def456)</p>
86+
+ <p class="session-id">Last updated on 2023-12-07</p>
87+
```
88+
89+
## Step 3: Declare the filter
90+
91+
Open your service declaration file (e.g. `declarations/MyService.json`) and locate the `filter` property of the specific terms you want to apply the filter to. If it doesn't exist, add it as an array.
92+
93+
### Filter without parameters
94+
95+
For filters that don’t require parameters, add the filter name as a string:
96+
97+
```json
98+
{
99+
"name": "MyService",
100+
"terms": {
101+
"Privacy Policy": {
102+
"fetch": "https://my.service.example/en/privacy-policy",
103+
"select": ".textcontent",
104+
"filter": [
105+
"removeSessionIds"
106+
]
107+
}
108+
}
109+
}
110+
```
111+
112+
### Filter with parameters
113+
114+
For filters that take parameters, use an object format, for example with the built-in filter `removeQueryParams` to remove query parameters from URLs:
115+
116+
```json
117+
{
118+
"name": "MyService",
119+
"terms": {
120+
"Privacy Policy": {
121+
"fetch": "https://my.service.example/en/privacy-policy",
122+
"select": ".textcontent",
123+
"filter": [
124+
{
125+
"removeQueryParams": ["utm_source", "utm_medium", "utm_campaign"]
126+
}
127+
]
128+
}
129+
}
130+
}
131+
```
132+
133+
### Multiple filters
134+
135+
You can combine multiple filters in the same declaration:
136+
137+
```json
138+
{
139+
"name": "MyService",
140+
"terms": {
141+
"Privacy Policy": {
142+
"fetch": "https://my.service.example/en/privacy-policy",
143+
"select": ".textcontent",
144+
"filter": [
145+
{
146+
"removeQueryParams": ["utm_source", "utm_medium"]
147+
},
148+
"removeSessionIds"
149+
]
150+
}
151+
}
152+
}
153+
```
154+
155+
## Step 4: Test the filter
156+
157+
After adding the filter, test your declaration to ensure it works correctly:
158+
159+
1. Start the terms tracking process
160+
2. Check that the noise has been removed
161+
3. Verify that important content is preserved
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
title: "Built-in filters"
3+
---
4+
5+
# Built-in filters
6+
7+
This reference details all available built-in [filters]({{< relref "terms/explanation/filters" >}}) that can be applied to avoid noise in versions.
8+
9+
{{< refItem
10+
name="removeQueryParams"
11+
description="Removes specified query parameters from URLs in links and images."
12+
>}}
13+
14+
```json
15+
"filter": [
16+
{
17+
"removeQueryParams": ["utm_source", "utm_medium"]
18+
}
19+
]
20+
```
21+
22+
```diff
23+
- <p>Read the <a href="https://example.com/example-page?utm_source=OGB&utm_medium=website&lang=en">list of our affiliates</a>.</p>
24+
+ <p>Read the <a href="https://example.com/example-page?lang=en">list of our affiliates</a>.</p>
25+
```
26+
27+
{{< /refItem >}}

content/terms/reference/declaration.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -139,10 +139,18 @@ As an array of those:
139139

140140
{{< refItem
141141
name="filter"
142-
type="array of strings"
143-
description="Array of filter function names to apply. Function will be executed in the order of the array. See the [Filters]({{< relref \"terms/reference/filters\" >}}) section for more information."
144-
example="[\"filterName1\", \"filterName2\"]"
145-
/>}}
142+
type="array of strings or objects"
143+
description="Array of filter functions to apply. Each item can be either a string (function name) or an object (function name as key, parameters as value). Functions will be executed in the order of the array. See the [Filters]({{< relref \"terms/reference/filters\" >}}) section for more information."
144+
>}}
145+
```json
146+
"filter": [
147+
"filterName1",
148+
{
149+
"filterName2": "param"
150+
}
151+
]
152+
```
153+
{{< /refItem >}}
146154

147155
{{< refItem
148156
name="combine"

0 commit comments

Comments
 (0)