Website URLs Finder 🕷️

This Python script is a spider that crawls a given website, designed to extract URLs and subdomains from a given starting URL. It uses the requests library to fetch web pages, BeautifulSoup for HTML parsing, and regular expressions to extract relevant information.

Getting Started 🚀

Make sure you have Python installed.
Clone this repository: git clone https://github.com/thib-web3/website_urls_finder.git
Go in the root folder: cd website_urls_finder
Install the required packages using the following command: pip install -r requirements.txt

Usage 📋

Run the index.py script with the URL you want to start crawling from as a command-line argument. For example:

python index.py https://example.com

The script will initiate the crawling process and extract URLs and subdomains related to the provided starting URL. It will then filter and save the results as a JSON: /example.json.

Features 🛠️

Extracts URLs and subdomains from a starting URL.
Filters out irrelevant URLs and subdomains such as JavaScript links and anchor tags.
Removes duplicate URLs.
Save locally extracted and filtered URLs.

Credits 🙌

The script utilizes the requests library for making HTTP requests: Requests
HTML parsing is performed using BeautifulSoup: BeautifulSoup
Regular expressions are used for URL extraction and manipulation.

License 📄

This project is licensed under the MIT License.

Author 👤

[titi]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
README.md		README.md
crawler.py		crawler.py
example.json		example.json
index.py		index.py
requirements.txt		requirements.txt
sortlist.json		sortlist.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Website URLs Finder 🕷️

Getting Started 🚀

Usage 📋

Features 🛠️

Credits 🙌

License 📄

Author 👤

About

Uh oh!

Releases

Packages

Uh oh!

Languages

titi-devv/website_urls_finder

Folders and files

Latest commit

History

Repository files navigation

Website URLs Finder 🕷️

Getting Started 🚀

Usage 📋

Features 🛠️

Credits 🙌

License 📄

Author 👤

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages