-
Notifications
You must be signed in to change notification settings - Fork 17
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
We should record all sites we visited in scan db, including sites with no trackers. If there are redirects, we should record the actual final site domain that we scanned.
This will enable:
- More meaningful tracker prevalence data
- Greater scan visibility ("80% of visited sites contain tracking", top ten slowest sites to visit)
- Listing of sites with no trackers
- Improvements to scan site list quality
Note: there will be sites with no trackers that have GA on them; it's just that PB didn't record tracking there for whatever reason
This continues work started in 5211f67 and 4e4d5f2.
New scan db table idea:
CREATE TABLE scan_sites (
scan_id INTEGER NOT NULL,
initial_site_id INTEGER NOT NULL,
final_site_id INTEGER NOT NULL,
status_id INTEGER NOT NULL,
start_time TIMESTAMP NOT NULL,
end_time TIMESTAMP NOT NULL,
UNIQUE(scan_id, initial_site_id)
FOREIGN KEY(scan_id) REFERENCES scan(id)
FOREIGN KEY(initial_site_id) REFERENCES site(id)
FOREIGN KEY(final_site_id) REFERENCES site(id)
FOREIGN KEY(status_id) REFERENCES site_status(id))site_statuses = {
"success": 1,
"timeout": 2,
"error": 3,
"antibot": 4,
}This will require updating some of the queries in sql/.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request