You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: whotracksme/data/Readme.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,13 +59,13 @@ WhoTracks.me datasets are provided monthly in the `assets//{month}/{country}/{fi
59
59
60
60
### Data collection
61
61
62
-
Nowadays, the data comes exclusively from users of the [Ghostery Extension](https://github.com/ghostery/ghostery-extension/). Precise user counts are difficult due to the nature of the data collection; but it is estimated to be in the order a few million devices per month, spread all around the world, but mostly in Europe and the US. The methodology is builds on the concept of k-Anonymity and is described in paper [Tracking the Trackers](https://0x65.dev/static/docs/studies/TrackingTheTrackers.pdf). The WhoTracks.me monthly data sets are derived from the same data that also powers the anti-tracking protection in Ghostery; it is also described in [this blog post](https://www.0x65.dev/blog/2019-12-19/blocking-tracking-without-blocking-trackers.html). The code can be found [here](https://github.com/whotracksme/webextension-packages/tree/main/packages/reporting/src/request).
62
+
Nowadays, the data comes exclusively from users of the [Ghostery Extension](https://github.com/ghostery/ghostery-extension/). Precise user counts are difficult due to the nature of the data collection; but it is estimated to be in the order a few million devices per month, spread all around the world, but mostly in Europe and the US. The methodology is builds on the concept of k-Anonymity and is described in paper [Tracking the Trackers](https://0x65.dev/static/docs/studies/TrackingTheTrackers.pdf). The WhoTracks.me monthly data sets are derived from the same data that also powers the anti-tracking protection in Ghostery; it is also described in [this blog post](https://www.0x65.dev/blog/2019-12-19/blocking-tracking-without-blocking-trackers.html). The code can be found [here](https://github.com/whotracksme/webextension-packages/tree/main/reporting/src/request).
63
63
64
64
Before 2018, all traffic came from users of the Cliqz Browser and the Cliqz extension (now both discontinued). Around April 2018, Ghostery users started to contribute to the data set. This both increased the user base and made it more internation (Cliqz was mostly used in German speaking regions). Here are some historical information from the Cliqz side (to help understand data sets before 2020):
65
65
66
66
* Data was collected from May 2017 from users that used Cliqz browser extension. In Feb 2018, 70% of the data came from German users according to [this](https://web.archive.org/web/20240121094157/https://whotracks.me/blog/update_feb_2018.html) blog post. Then in March 2018, users of Ghostery Firefox extension - and Ghostery extension available for other browsers (Safari, Chrome, Opera and Edge) from users that opted-in to HumanWeb data collection - were added to the dataset. This caused a slight decrease in the avg. no. of trackers in April 2018, since Ghostery users were blocking more trackers. This is explained in [this](https://web.archive.org/web/20240430053538/https://whotracks.me/blog/where_is_the_data_from.html) and [this](https://web.archive.org/web/20240121094145/https://whotracks.me/blog/update_apr_2018.html) blog posts.
67
67
*[This](https://web.archive.org/web/20240121094145/https://whotracks.me/blog/update_apr_2018.html) blog post illustrates where the traffic came from in April 2018: Germany and USA being most representative.
68
-
*[This](https://cliqz.com/en/magazine/government-websites-leak-data-to-google-co) blog post notes that WhoTracks.me does not collect data for pages with no trackers; in other words, collected data for all sites contains some number of third-parties and tracking.
68
+
*[This](https://web.archive.org/web/20241105085238/https://cliqz.com/en/magazine/government-websites-leak-data-to-google-co) blog post notes that WhoTracks.me does not collect data for pages with no trackers; in other words, collected data for all sites contains some number of third-parties and tracking.
69
69
70
70
### Datasets
71
71
@@ -133,7 +133,7 @@ The data is created by aggregating data about page loads at several different le
133
133
134
134
*`requests_failed` - average number of requests make to the tracker per page which do not succeed. In other words, avg. number of failed requests per page load (for comparison with `requests` to get an idea of how aggressive the blocking is). This is an approximate measure of blocking from external sources (i.e. adblocking extensions or firewalls). Measure [added](https://web.archive.org/web/20240121094211/https://whotracks.me/blog/update_dec_2017.html) in Dec 2017. Positive float.
135
135
136
-
*`has_blocking` - proportion of pages where some kind of external blocking of the tracker was detected.Measure [added](https://web.archive.org/web/20240121094211/https://whotracks.me/blog/update_dec_2017.html) in Dec 2017. Float between 0 and 1.
136
+
*`has_blocking` - proportion of pages where some kind of external blocking of the tracker was detected.Measure [added](https://web.archive.org/web/20240121094211/https://whotracks.me/blog/update_dec_2017.html) in Dec 2017. Float between 0 and 1.
137
137
138
138
> "These signals [`requests_failed` and `has_blocking`] should be able to tell us something about the impact of blocking on different trackers in the ecosystem. For example, we see evidence of blocking 40% of the time for Google Analytics and Facebook [in Dec 2017], and between 10% and 20% of requests failing. Thus, anyone using these services to measure activity and conversions on their sites must reckon with error rates in these orders. We also can see how new entrants can initially avoid the effects of blocking - for Tru Optik and Digitrust who we mentioned earlier, we measure only 5 and 1% of pages which may be affected by blocking."
0 commit comments