Age Verification Identifier for Adult Websites

This project automates the detection of age verification (AV) mechanisms across adult websites using Selenium, BeautifulSoup, and a custom-trained BERT classifier.

It detects:

Whether the site uses age verification prompts
If it relies on third-party AV providers (e.g., Yoti, Veratad)
Whether the site is displaying a protest page due to legislation (e.g., Virginia/Utah laws)

Project Structure

AV_identifier/
├── main.py                     # Entry point to run the crawler
├── data/
│   ├── classified_adult_sites_test.csv
│   └── age_verification_results_selenium.csv
├── crawler/
│   ├── init.py
│   ├── config.py               # Keywords and provider list
│   ├── utils.py                # Checkbox and button click logic
│   ├── detection.py            # AV & protest detection helpers
│   └── scraper.py              # Main scraping logic
├── screenshots/                # Saved screenshots for visual inspection
└── html_dumps/                 # HTML dumps for fallback debugging

Features

Detects explicit AV content using a set of defined keywords
Identifies known third-party AV services
Handles iframes and navigates redirects to detect third-party integrations
Captures screenshots and HTML snapshots for auditability
Flags protest pages that block content due to legal requirements

How It Works

A BERT model classifies domains as Adult or Non-Adult.
The crawler:
- Loads each adult domain in a headless browser
- Checks for AV indicators and protest language
- Attempts modal interaction (e.g., checkbox, AV button)
- Follows links to AV provider if present
- Logs results and saves screenshots

Requirements

Python 3.10+
Google Chrome (latest)
ChromeDriver matching your Chrome version

Install dependencies

pip install -r requirements.txt

Setup

Install matching ChromeDriver

If you’re using macOS:

brew install chromedriver

Run the scraper

python main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Age Verification Identifier for Adult Websites

Project Structure

Features

How It Works

Requirements

Install dependencies

Setup

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
crawler		crawler
data		data
html_dumps		html_dumps
screenshots		screenshots
.DS_Store		.DS_Store
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

gwusec/av_identifier_adult_content

Folders and files

Latest commit

History

Repository files navigation

Age Verification Identifier for Adult Websites

Project Structure

Features

How It Works

Requirements

Install dependencies

Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages