Skip to content
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions content/terms/explanation/filters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
title: "Filters"
weight: 3
---

# Filters

Filters solve [noise]({{< relref "/terms/guideline/declaring/#usual-noise" >}}) issues in terms versions that cannot be addressed with direct selection or removal of content using CSS selectors or range selectors.

## When filters are needed

Filters are necessary when standard CSS selectors and range selectors cannot adequately address noise in terms versions. They provide a solution for complex content manipulation that goes beyond simple selection and removal.

Use filters when:

- **CSS selectors are insufficient**, for example when noise appears within content that can't be targeted with selectors or [range selectors]({{< relref "terms/explanation/range-selectors" >}}) with the [`select`]({{< relref "terms/reference/declaration/#ref-select" >}}) and [`remove`]({{< relref "terms/reference/declaration/#ref-remove" >}}) properties.
- **Content is dynamically generated**, for example when elements change on each page load with tracking parameters in URLs (like `utm_source`, `utm_medium`) or dynamic elements with changing classes or IDs.
- **Complex tasks are needed**, for example when content transformation is required such as converting images to base64 to store them in the terms version or converting date-based content to a more stable format (like "Updated X days ago" to "Last updated on YYYY-MM-DD").

## How filters work

Filters are JavaScript functions that receive a JSDOM document instance and can manipulate the DOM structure directly. They modify the document structure and content in-place and they run sequentially in the order specified in the declaration.

## Filter design principles

When designing filters, follow these core principles:

- **Be specific**: target only the noise you want to remove. Avoid broad selectors that might accidentally remove important content.

> For example, if your filter converts relative dates to absolute dates, use `.metadata time` not `time` which might also affect important effective dates within the terms content.

- **Be idempotent**: filters should produce the same result even if run multiple times on their own output. This ensures consistency and prevents unexpected behavior.

> For example, if your filter adds section numbers like "1." to headings, check if numbers already exist to prevent "1. Privacy Policy" from becoming "1. 1. Privacy Policy" on repeated runs.

- **Be efficient**: use efficient DOM queries and avoid unnecessary operations. Process only the elements you need to modify.

> For example, if your filter updates timestamp elements with a specific class, use `document.querySelector('.timestamp')` instead of `document.querySelectorAll('*')` followed by filtering for timestamp elements.

- **Be safe**: filters should not accidentally remove important content. The generated version should always be checked after adding a filter to ensure it still contains the whole terms content.
159 changes: 159 additions & 0 deletions content/terms/how-to/apply-filters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
---
title: Apply filters
weight: 7
---

# How to apply filters

This guide explains how to apply filters to existing declarations to remove meaningless content that changes on each page load or that cannot be removed with CSS selectors to avoid noise in the terms changes history.

## Prerequisites

- An existing terms declaration file
- Identified the noise you want to remove and ensure it cannot be removed with CSS selectors with the [`remove`]({{< relref "terms/reference/declaration/#ref-remove" >}}) property.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give an example of noise that will be filtered by removeQueryParams, the only built-in filter available at the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a tutorial, it's an how to guide, so the idea is not to learn but to apply concrete solution to a real word problem, so I'm not sure it is appropriate


## Step 1: Check for built-in filters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll use a structure based on the principle that we most often use a built-in filter and optionally a custom filter, so I would put everything related to creating a custom filter on a dedicated how-to page.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we currently do not have enough builtin filters to justify splitting into two pages


Built-in filters are pre-defined functions that handle common noise patterns. They're the easiest way to clean up content without writing custom code.

Review the available [built-in filters]({{< relref "/terms/reference/built-in-filters" >}}) to find if one matches your needs.

If you find a suitable built-in filter, proceed to [Step 2](#step-2-declare-the-filter), otherwise you will need to create a custom filter.

### Create a custom filter (optional)

If no built-in filter matches your needs, you'll need to create a custom filter. This requires JavaScript knowledge and familiarity with DOM manipulation.

#### Create the filter file

Create a JavaScript file with the same name as your service declaration but with `.filters.js` extension. For example, if your declaration is `declarations/MyService.json`, create `declarations/MyService.filters.js`.

#### Write the filter function

Define your filter function following this signature:

```js
export function myCustomFilter(document, [parameters]) {
// Your filter logic here
}
```

**Parameters:**

- `document`: JSDOM document instance representing the web page
- `parameters`: Values passed from the declaration (optional)

**Example: Remove session IDs from text content**

For example, let's say you want to remove session IDs from text content:

```html
<p>We collect your data for the following purposes:</p>
<ul>
<li>To provide our services</li>
<li>To improve user experience</li>
</ul>
<p class="session-id">Last updated on 2023-12-07 (Session: abc123def456)</p>
```

You can implement this filter as follows:

```js
export function removeSessionIds(document) {
// Find all paragraphs that might contain session IDs
const paragraphs = document.querySelectorAll('p.session-id');

paragraphs.forEach(paragraph => {
let text = paragraph.textContent;
// Remove session ID patterns like "Session: abc123" or "(Session: def456)"
text = text.replace(/\s*\(?Session:\s*[a-zA-Z0-9]+\)?/g, '');
paragraph.textContent = text.trim();
});
}
```

Result after applying the filter:

```diff
<p>We collect your data for the following purposes:</p>
<ul>
<li>To provide our services</li>
<li>To improve user experience</li>
</ul>
- <p class="session-id">Last updated on 2023-12-07 (Session: abc123def456)</p>
+ <p class="session-id">Last updated on 2023-12-07</p>
```

## Step 2: Declare the filter

Open your service declaration file (e.g., `declarations/MyService.json`) and locate the `filter` property of the specific terms you want to apply the filter to. If it doesn't exist, add it as an array.

### Filter without parameters

For filters that don't require parameters, add the filter name as a string:

```json
{
"name": "MyService",
"terms": {
"Privacy Policy": {
"fetch": "https://my.service.com/en/privacy-policy",
"select": ".textcontent",
"filter": [
"removeSessionIds"
]
}
}
}
```

### Parameterized filter

For filters that require parameters, use an object format, for example with the built-in filter `removeQueryParams` to remove query parameters from URLs:

```json
{
"name": "MyService",
"terms": {
"Privacy Policy": {
"fetch": "https://my.service.com/en/privacy-policy",
"select": ".textcontent",
"filter": [
{
"removeQueryParams": ["utm_source", "utm_medium", "utm_campaign"]
}
]
}
}
}
```

### Multiple filters

You can combine multiple filters in the same declaration:

```json
{
"name": "MyService",
"terms": {
"Privacy Policy": {
"fetch": "https://my.service.com/en/privacy-policy",
"select": ".textcontent",
"filter": [
{
"removeQueryParams": ["utm_source", "utm_medium"]
},
"removeSessionIds"
]
}
}
}
```

## Step 3: Test the filter

After adding the filter, test your declaration to ensure it works correctly:

1. Start the terms tracking process
2. Check that the noise has been removed
3. Verify that important content is preserved
31 changes: 31 additions & 0 deletions content/terms/reference/built-in-filters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: "Built-in filters"
---

# Built-in filters

This reference documentation details all available built-in filters that can be used to avoid noise in the terms content.

## Filters

{{< refItem
name="removeQueryParams"
description="Removes specified query parameters from URLs in links and images within the terms content"
>}}

```json
"filter": [
{
"removeQueryParams": ["utm_source", "utm_medium"]
}
]
```

Result:

```diff
- <p>Read the <a href="https://example.com/example-page?utm_source=OGB&utm_medium=website&lang=en">list of our affiliates</a>.</p>
+ <p>Read the <a href="https://example.com/example-page?lang=en">list of our affiliates</a>.</p>
```

{{< /refItem >}}
16 changes: 12 additions & 4 deletions content/terms/reference/declaration.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,10 +139,18 @@ As an array of those:

{{< refItem
name="filter"
type="array of strings"
description="Array of filter function names to apply. Function will be executed in the order of the array. See the [Filters]({{< relref \"terms/reference/filters\" >}}) section for more information."
example="[\"filterName1\", \"filterName2\"]"
/>}}
type="array of strings or objects"
description="Array of filter functions to apply. Each item can be either a string (function name) or an object (function name as key, parameters as value). Functions will be executed in the order of the array. See the [Filters]({{< relref \"terms/reference/filters\" >}}) section for more information."
>}}
```json
"filter": [
"filterName1",
{
"filterName2": "param"
}
]
```
{{< /refItem >}}

{{< refItem
name="combine"
Expand Down
Loading