InspecTor is a command-line tool designed to extract metadata from websites, including .onion sites, anonymously via the Tor network. It allows users to specify target URLs and retrieve various metadata fields such as emails, phone numbers, links, images, and more. The script supports concurrent requests, saving results to JSON or an SQLite database, and optional use of Selenium for dynamic content.
InspecTor is a command-line tool designed to extract metadata from .onion websites anonymously via the Tor network. It allows users to specify target .onion URLs and retrieve various metadata fields such as emails, links, images, and more. The script supports concurrent requests, saving results to JSON or an SQLite database, and optional use of Selenium for dynamic content.
- Extract metadata from
.onionwebsites - Support for multiple URLs and input files
- Concurrent processing with configurable number of threads
- Optional SSL verification
- Extraction of specific metadata fields
- Optional use of Selenium for dynamic content
- Output to JSON file or stdout
- Save results to SQLite database
- Human-readable output option
- Python 3.x
- Tor installed and running on
127.0.0.1:9050 - Chrome browser and ChromeDriver (if using Selenium)
The required Python packages are listed in requirements.txt:
requestsbeautifulsoup4seleniumfake-useragentcoloramaurllib3phonenumbers
-
Clone the repository:
git clone https://github.com/noobosaurus-r3x/InspecTor.git cd InspecTor -
Install Python packages:
pip install -r requirements.txt
-
Install Tor:
sudo apt update sudo apt install tor
-
Start Tor service:
sudo systemctl start tor sudo systemctl status tor
-
Install Chrome and ChromeDriver (if using Selenium):
-
Chrome Browser:
Download and install from the Google Chrome website.
-
ChromeDriver:
-
Find the version of your Chrome browser:
google-chrome --version
-
Download the corresponding ChromeDriver.
-
Ensure
chromedriveris in your system's PATH or specify the path in the script.
-
-
Extract metadata from one or more URLs (both .onion and regular websites):
python3 InspecTor.py -u https://exampleonionsite1.onion https://www.example.comExtract metadata from URLs listed in a file:
python3 InspecTor.py -f urls.txtForce all traffic through Tor:
python3 InspecTor.py -u https://www.example.com --force-tor-
u,-urlsList of
.onionURLs to scrape. -
f,-filePath to a file containing
.onionURLs, one per line. -
o,-outputOutput JSON file to save metadata (use
"-"for stdout). Default isonion_site_metadata.json. -
-force-torRoute all traffic through the Tor network, even for regular URLs.
-
-verify-sslEnable SSL certificate verification (default: enabled).
-
-no-verify-sslDisable SSL certificate verification.
-
-use-seleniumUse Selenium for handling dynamic content.
-
-max-workersMaximum number of concurrent threads (default: 5).
-
-databaseSQLite database file to store metadata (default:
metadata.db). -
-fieldsSpecify which metadata fields to extract. Available fields are listed below.
-
-extract-allExtract all available metadata fields.
-
-human-readable,hrOutput the results in a human-readable format.
-
--default-regionSpecify the phone numbers' format (FR for France)
The following fields can be specified with the --fields argument:
emailsphone_numberslinksexternal_linksimagesscriptscss_filessocial_linkscspserver_technologiescrypto_walletsheaderstitledescriptionkeywordsog_titleog_descriptiontimestamphttp_headers
Extract only emails from a .onion site:
python3 InspecTor.py -u https://example.onion --fields emails -o emails.jsonTo extract phone numbers from a website with French phone numbers:
python3 InspecTor.py -u https://example.com --fields phone_numbers --default-region FRExtract emails and links:
python3 InspecTor.py -u https://example.onion --fields emails links -o data.jsonExtract all metadata:
python3 InspecTor.py -u https://example.onion --extract-all -o all_metadata.jsonExtract emails and phone numbers:
python3 InspecTor.py -u https://example.com --fields emails phone_numbers -o contact_info.jsonDisable SSL verification and use Selenium:
python3 InspecTor.py -u https://example.onion -o metadata.json --no-verify-ssl --use-seleniumOutput results in a human-readable format:
python3 InspecTor.py -u https://example.onion --human-readableOutput JSON to stdout and pipe to jq for formatting:
python3 InspecTor.py -u https://example.onion -o - | jq '.'-
JSON File:
By default, the script saves the extracted metadata to
onion_site_metadata.json. Use the-oargument to specify a different output file or use-to output to stdout. -
SQLite Database:
The script saves metadata to an SQLite database (
metadata.dbby default). Use the--databaseargument to specify a different database file. -
Human-Readable:
Use the
--human-readableor-hrflag to print the results in a human-readable format with colored output.
-
Tor Configuration:
Ensure that the Tor service is running on
127.0.0.1:9050. The script routes all HTTP requests through the Tor SOCKS5 proxy. -
Selenium Usage:
If the
--use-seleniumflag is used, Chrome browser and ChromeDriver must be installed. Selenium is used to handle dynamic content that requires JavaScript execution. -
SSL Verification:
SSL certificate verification is enabled by default. Some
.onionsites may have invalid certificates. Use the--no-verify-sslflag to disable SSL verification. -
Concurrency:
The script uses multithreading to process multiple URLs concurrently. Adjust the number of workers with the
--max-workersargument as needed. -
Dependencies:
All Python dependencies are listed in
requirements.txt. Install them usingpip install -r requirements.txt. -
Tor Accessibility:
If you're scraping
.onionsites or using the--force-toroption, ensure that the Tor service is accessible and running properly. The script checks if the Tor SOCKS5 proxy is open.
I am not a professional developer, and this tool could be improved with your help. Feel free to fork the repository and enhance it by adding features, fixing bugs, or optimizing the code. Your contributions are welcome and highly appreciated !
This project is licensed under the MIT License.