Skip to content

Commit d66108d

Browse files
Fix broken links
1 parent f71a3c9 commit d66108d

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

whotracksme/data/Readme.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,13 +59,13 @@ WhoTracks.me datasets are provided monthly in the `assets//{month}/{country}/{fi
5959

6060
### Data collection
6161

62-
Nowadays, the data comes exclusively from users of the [Ghostery Extension](https://github.com/ghostery/ghostery-extension/). Precise user counts are difficult due to the nature of the data collection; but it is estimated to be in the order a few million devices per month, spread all around the world, but mostly in Europe and the US. The methodology is builds on the concept of k-Anonymity and is described in paper [Tracking the Trackers](https://0x65.dev/static/docs/studies/TrackingTheTrackers.pdf). The WhoTracks.me monthly data sets are derived from the same data that also powers the anti-tracking protection in Ghostery; it is also described in [this blog post](https://www.0x65.dev/blog/2019-12-19/blocking-tracking-without-blocking-trackers.html). The code can be found [here](https://github.com/whotracksme/webextension-packages/tree/main/packages/reporting/src/request).
62+
Nowadays, the data comes exclusively from users of the [Ghostery Extension](https://github.com/ghostery/ghostery-extension/). Precise user counts are difficult due to the nature of the data collection; but it is estimated to be in the order a few million devices per month, spread all around the world, but mostly in Europe and the US. The methodology is builds on the concept of k-Anonymity and is described in paper [Tracking the Trackers](https://0x65.dev/static/docs/studies/TrackingTheTrackers.pdf). The WhoTracks.me monthly data sets are derived from the same data that also powers the anti-tracking protection in Ghostery; it is also described in [this blog post](https://www.0x65.dev/blog/2019-12-19/blocking-tracking-without-blocking-trackers.html). The code can be found [here](https://github.com/whotracksme/webextension-packages/tree/main/reporting/src/request).
6363

6464
Before 2018, all traffic came from users of the Cliqz Browser and the Cliqz extension (now both discontinued). Around April 2018, Ghostery users started to contribute to the data set. This both increased the user base and made it more internation (Cliqz was mostly used in German speaking regions). Here are some historical information from the Cliqz side (to help understand data sets before 2020):
6565

6666
* Data was collected from May 2017 from users that used Cliqz browser extension. In Feb 2018, 70% of the data came from German users according to [this](https://web.archive.org/web/20240121094157/https://whotracks.me/blog/update_feb_2018.html) blog post. Then in March 2018, users of Ghostery Firefox extension - and Ghostery extension available for other browsers (Safari, Chrome, Opera and Edge) from users that opted-in to HumanWeb data collection - were added to the dataset. This caused a slight decrease in the avg. no. of trackers in April 2018, since Ghostery users were blocking more trackers. This is explained in [this](https://web.archive.org/web/20240430053538/https://whotracks.me/blog/where_is_the_data_from.html) and [this](https://web.archive.org/web/20240121094145/https://whotracks.me/blog/update_apr_2018.html) blog posts.
6767
* [This](https://web.archive.org/web/20240121094145/https://whotracks.me/blog/update_apr_2018.html) blog post illustrates where the traffic came from in April 2018: Germany and USA being most representative.
68-
* [This](https://cliqz.com/en/magazine/government-websites-leak-data-to-google-co) blog post notes that WhoTracks.me does not collect data for pages with no trackers; in other words, collected data for all sites contains some number of third-parties and tracking.
68+
* [This](https://web.archive.org/web/20241105085238/https://cliqz.com/en/magazine/government-websites-leak-data-to-google-co) blog post notes that WhoTracks.me does not collect data for pages with no trackers; in other words, collected data for all sites contains some number of third-parties and tracking.
6969

7070
### Datasets
7171

@@ -133,7 +133,7 @@ The data is created by aggregating data about page loads at several different le
133133

134134
* `requests_failed` - average number of requests make to the tracker per page which do not succeed. In other words, avg. number of failed requests per page load (for comparison with `requests` to get an idea of how aggressive the blocking is). This is an approximate measure of blocking from external sources (i.e. adblocking extensions or firewalls). Measure [added](https://web.archive.org/web/20240121094211/https://whotracks.me/blog/update_dec_2017.html) in Dec 2017. Positive float.
135135

136-
* `has_blocking` - proportion of pages where some kind of external blocking of the tracker was detected.Measure [added](https://web.archive.org/web/20240121094211/https://whotracks.me/blog/update_dec_2017.html) in Dec 2017. Float between 0 and 1.
136+
* `has_blocking` - proportion of pages where some kind of external blocking of the tracker was detected. Measure [added](https://web.archive.org/web/20240121094211/https://whotracks.me/blog/update_dec_2017.html) in Dec 2017. Float between 0 and 1.
137137

138138
> "These signals [`requests_failed` and `has_blocking`] should be able to tell us something about the impact of blocking on different trackers in the ecosystem. For example, we see evidence of blocking 40% of the time for Google Analytics and Facebook [in Dec 2017], and between 10% and 20% of requests failing. Thus, anyone using these services to measure activity and conversions on their sites must reckon with error rates in these orders. We also can see how new entrants can initially avoid the effects of blocking - for Tru Optik and Digitrust who we mentioned earlier, we measure only 5 and 1% of pages which may be affected by blocking."
139139

0 commit comments

Comments
 (0)